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Abstract.  This  paper  formulates  and  analyzes  a  pattern  search  method  for  general  constrained  optimization 
based  on  filter  methods  for  step  acceptance.  Roughly,  a  filter  method  accepts  a  step  that  either  improves  the  objec¬ 
tive  function  value  or  the  value  of  some  function  that  measures  the  constraint  violation.  The  new  algorithm  does 
not  compute  or  approximate  any  derivatives,  penalty  constants  or  Lagrange  multipliers.  A  key  feature  of  the  new 
algorithm  is  that  it  preserves  the  useful  division  into  global  SEARCH  and  local  POLL  steps.  It  is  shown  here  that 
the  algorithm  identifies  limit  points  at  which  optimality  conditions  depend  on  local  smoothness  of  the  functions. 
Stronger  optimality  conditions  are  guaranteed  for  smoother  functions.  In  the  absence  of  general  constraints,  the 
proposed  algorithm  and  its  convergence  analysis  generalize  the  previous  work  on  unconstrained,  bound  constrained 
and  linearly  constrained  generalized  pattern  search.  The  algorithm  is  illustrated  on  some  test  examples  and  on  an 
industrial  wing  planform  engineering  design  application. 
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1.  Introduction.  The  optimization  problem  considered  in  this  paper  is 


(1.1) 


min  f{x) 

xex  J  w 

s.t.  C(x)  <  0  , 


where  /  :  X  — »  R.  U  {°°}  and  C  :  X  — ■>  (R  U  are  functions  with  C  =  (ci, . . .  ,cm)T ,  and 

A  is  a  full  dimensional  polyhedron  in  R"  defined  by  finitely  many  nondegenerate  explicit 
bound  and  linear  constraints.  It  is  possible,  for  instance  when  the  functions  are  provided  as 
“black  box”  subroutine  calls,  that  some  constraints  might  be  linear  without  the  knowledge 
of  the  user.  In  that  case,  these  linear  constraints  are  incorporated  in  C(x)  <  0.  The  feasible 
region  defined  by  the  constraints  C(x)  <  0  is  denoted  by  £2.  The  proposed  approach  combines 
aspects  of  filter  algorithms  to  handle  £2,  and  pattern  search  algorithms  to  handle  X. 

Filter  algorithms  were  introduced  by  Fletcher  and  Leyffer  [14]  as  a  way  to  globalize 
sequential  linear  and  quadratic  programming  (SQP  and  SLP)  without  using  any  merit  function 
that  would  require  a  troublesome  parameter  to  be  provided  by  the  user  for  weighting  the 
relative  merits  of  improving  feasibility  and  optimality.  A  filter  algorithm  introduces  a  function 
that  aggregates  constraint  violations  and  then  treats  the  resulting  biobjective  problem.  A 
step  is  accepted  if  it  either  reduces  the  value  of  the  objective  function  or  of  the  constraint 
violation.  Although  this  clearly  is  less  parameter  dependent  than  a  penalty  function,  still  we 
acknowledge  that  specifying  a  constraint  violation  function  implies  assigning  relative  weights 
to  reducing  each  constraint. 

Fletcher  el  al.  [15,  16]  show  convergence  of  the  filter  method  that  uses  SQP  or  SLP 
to  suggest  steps.  In  both  cases,  convergence  is  to  a  point  satisfying  Fritz  John  optimality 


*  Work  of  the  first  author  was  supported  by  FCAR  grant  NC72792,  NSERC  grant  239436-01  and  fellowship  PDF- 
207432-1998  in  part  during  a  post-doctoral  stay  at  Rice  University,  and  both  authors  were  supported  by  AFOSR 
F49620-01-1-0013,  the  Boeing  Company,  Sandia  LG-4253,  ExxonMobil,  the  LANL  Computer  Science  Institute 
(LACSI)  contract  03891-99-23.  and  the  Sandia  Computer  Science  Research  Institute  (CSRI). 

GLRAD  and  Departement  de  Mathematiques  et  de  Genie  Industriel,  Ecole  Polytechnique  de  Montreal, 
C.P  6079,  Succ.  Centre-ville,  Montreal  (Quebec).  H3C  3A7  Canada  (www.gerad.ca/Charles.Audet,  Char¬ 
les.  Audet@gerad.ca) 

^Computational  and  Applied  Mathematics  Department,  Rice  University  -  MS  134,  6100  South  Main  Street, 
Houston.  Texas,  77005-1892  (www.caam.rice.edu/~dennis,  dennis@caam.rice.edu) 


1 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

2006 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2006  to  00-00-2006 

4.  TITLE  AND  SUBTITLE 

A  Pattern  Search  Filter  Method  for  Nonlinear  Programming  Without 
Derivatives 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Rice  University, Computational  and  Applied  Mathematics 

Department, 6100  South  Main  Street, Houston, TX, 77005- 1892 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

The  original  document  contains  color  images. 

14.  ABSTRACT 

see  report 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

ABSTRACT 

18.  NUMBER 

OF  PAGES 

28 

19a.  NAME  OF 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


2 


CHARLES  AUDET  AND  J.E.  DENNIS  Jr. 


June  12,  2003 


conditions.  Thus,  previous  filter  algorithms  require  explicit  use  of  the  derivatives  of  both  the 
objective  and  the  constraints.  They  also  require  more  than  simple  decrease  of  the  objective 
and  constraint  violation  functions  to  accept  a  step.  Furthermore,  Fritz  John  optimality  is 
weaker  than  the  more  common  Karush-Kuhn-Tucker  conditions,  but,  numerical  results  for 
the  filter  methods  are  very  promising.  It  is  well  to  remember  that  the  theoretical  justification 
for  quasi-Newton  methods  trailed  their  efficacy  by  several  years. 

The  generalized  pattern  search  algorithm  class  (GPS)  designed  by  Torczon  [26]  unifies 
a  wide  class  of  useful  derivative-free  algorithms  for  unconstrained  optimization.  Lewis  and 
Torczon  extended  the  GPS  framework  to  bound  constrained  optimization  [20],  and  more 
generally  [22],  for  problems  with  a  finite  number  of  linear  constraints.  Audet  and  Dennis  [2] 
allow  extended  valued  functions,  which  arise  often  in  practice  (e.g.,  [4,  5]),  and  provide 
an  analysis  that,  among  other  things,  identifies  a  specific  set  of  limit  points  allowing  the 
application  of  Clarke’s  [8]  generalized  derivatives  under  local  Lipschitz  continuity  to  unify, 
strengthen  and  simplify  the  unconstrained  and  simply  constrained  Lewis  and  Torczon  theory. 

Under  the  assumption  that  /  is  continuously  differentiable,  Torczon  [26]  showed  that 
GPS  methods  produce  some  limit  point  for  which  the  gradient  of  the  objective  function  is 
zero,  and  Lewis  and  Torczon  showed  that  their  adaptations  produce  a  Karush-Kuhn-Tucker 
point  for  bound  constrained  [20]  and  linearly  constrained  [22]  problems. 

Subsequences  of  trial  points  that  are  optimizers  in  some  discrete  sense  are  considered 
in  [2],  Even  without  any  assumptions  on  the  smoothness  of  /,  limit  points  of  these  subse¬ 
quences  are  shown  to  exist  under  standard  assumptions.  It  is  also  shown  that  the  following 
intermediate  results  hold  :  If  it  turns  out  that  /  is  Lipschitz  near  a  strictly  feasible  limit  point 
then  Clarke’s  derivatives  are  nonnegative  on  a  set  of  positive  spanning  directions.  A  posi¬ 
tive  spanning  set  is  a  set  of  directions  in  R"  whose  non-negative  linear  combinations  span 
the  whole  space  R" .  Moreover,  if  /  is  strictly  differentiable  (defined  in  Section  5.2)  at  that 
strictly  feasible  point,  then  the  gradient  is  guaranteed  to  be  zero.  Similar  results  are  shown 
when  the  limit  point  is  on  the  boundary  of  the  linearly  constrained  domain. 

Assuming  that  the  functions  /  and  C  are  twice  continuously  differentiable  and  that  the 
constraint  Jacobian  has  full  rank,  Lewis  and  Torczon  [23]  propose  and  analyze  a  derivative- 
free  procedure  to  handle  general  constraints.  In  their  procedure,  GPS  for  bound  constraints 
is  used  to  carry  out  the  specified  inexact  minimizations  of  the  sequence  of  augmented  La- 
grangian  subproblems  formulated  by  Conn,  Gould,  and  Toint  [9],  Our  algorithm  is  a  GPS 
alternative  to  their  method,  for  the  cases  when  one  would  prefer  not  to  assume  continuous 
second  derivatives,  or  when  one  wishes  to  avoid  estimating  penalty  parameters  and  Lagrange 
multipliers,  but  is  willing  to  settle  for  weaker  optimality  conditions.  Thus,  we  do  not  claim 
that  our  method  is  to  be  preferred  for  every  problem. 

One  of  our  objectives  is  to  construct  an  algorithm  that  can  be  used  on  applications  where 
the  objective  and  constraints  are  not  given  analytically,  but  as  “black  boxes”.  For  such  ap¬ 
plications,  a  value  x  £  R"  will  be  used  as  input  to  a  nontrivial  simulation  to  evaluate  f(x) 
and  C(x).  The  subroutine  call  may  fail  to  return  a  value  a  significant  percentage  of  times  it 
is  invoked  [4,  5,  6,  7,  17],  and  even  when  it  succeeds  several  factors  (e.g.,  noise,  numerical 
instability,  modelling  inaccuracy,...),  may  mean  that  one  cannot  construct  accurate  approxi¬ 
mate  derivatives.  Under  these  circumstances,  i.e.,  the  structure  of  the  functions  is  unknown  to 
the  optimizer,  we  will  settle  for  weakened  local  optimality  convergence  results.  The  pattern 
search  filter  algorithm  presented  here  has  the  following  features: 

•  It  is  completely  derivative-free.  It  does  not  use  or  approximate  any  derivative  infor¬ 
mation,  nor  does  it  attempt  to  linearize  any  constraint. 

•  It  transparently  generalizes  GPS  for  unconstrained  problems  and  for  bound  or  linear 
constraints  when  they  are  treated,  as  in  [2,  20,  22],  by  rejecting  points  infeasible  for 
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those  constraints,  and  by  selecting  polling  directions  that  conform  to  the  boundary 
of  the  domain  (see  Definition  4.1). 

•  It  uses  a  step  acceptance  rule  based  on  filter  methods,  so  there  is  no  need  for  any 
penalty  constants  or  Lagrange  multiplier  estimates. 

•  It  makes  assumptions  on  the  problem  functions  /  and  C  that  conform  to  the  practical 
instances  that  interest  us  most  [4,  3,  5].  They  may  be  discontinuous  and  even  take  on 
infinite  values.  Therefore,  no  global  smoothness  assumption  is  justified;  however, 
the  strength  of  the  optimality  conditions  guaranteed  by  the  algorithm  depends  on 
local  smoothness  of  the  functions  at  the  limit  point. 

•  It  preserves  the  desirable  GPS  property  of  requiring  only  simple  decrease,  which  is 
expressed  in  the  present  context  with  respect  to  both  the  objective  and  and  constraint 
violation  functions. 

•  It  does  not  require  any  constraint  qualifications. 

The  Boeing  Design  Explorer  software  uses  the  GPS  filter  algorithm  given  here  as  a  meta 
algorithm  in  the  surrogate  management  framework  for  general  nonlinear  programming  [3], 
Thus,  a  key  issue  for  us  is  that  we  preserve  the  division  into  SEARCH  and  POLL  steps.  The 
SEARCH  steps  we  prefer  make  a  global  exploration  of  the  variable  space,  and  they  might  use 
inexpensive  surrogates  objective  and  constraints.  The  POLL  step  is  a  local  exploration  near  an 
incumbent  point  and  allows  the  theory  to  guarantee  convergence.  Both  the  SEARCH  and  POLL 
steps  are  detailed  in  Section  2.  Another  use  of  our  algorithm,  with  properly  chosen  SEARCH 
steps  is  when  one  would  rather  not  find  only  the  nearby  local  optimizer  usually  found  by 
derivative-based  methods,  but  instead  is  willing  to  use  some  function  evaluations  to  explore 
the  domain  more  thoroughly.  It  is  important  to  keep  in  mind  that  global  optimization  of  black 
box  functions  is  impossible,  even  to  the  point  that  if  one  had  the  global  optimum,  one  would 
not  be  certain  of  it  [25], 

The  paper  is  organized  as  follows.  Sections  2  and  3  give  a  brief  description  of  pattern 
search  and  filter  algorithms.  In  Section  4,  we  present  and  begin  the  analysis  of  a  new  al¬ 
gorithm  that  combines  their  features.  Specifically,  without  any  smoothness  assumptions  on 
the  problem,  we  show  the  existence  of  some  promising  limit  points.  Our  optimality  results 
rely  on  Clarke’s  calculus  with  respect  to  both  the  constraint  violation  and  objective  functions, 
and  on  the  notion  of  contingent  cones,  and  so  we  provide  the  necessary  background  in  Sec¬ 
tion  5.  Section  6  shows  that  if  the  constraint  violation  or  objective  function  is  locally  smooth 
at  such  a  limit  point,  then  some  first  order  optimality  conditions  are  satisfied.  In  the  absence 
of  general  constraints,  the  convergence  results  reduce  to  those  presented  in  [2],  Finally,  in 
Section  7,  we  make  important  points  through  three  examples.  First  we  show  the  value  of  a 
filter  method  as  opposed  to  a  “barrier”  method,  which  rejects  trial  points  that  violate  the  linear 
constraints,  [22,  2].  Second  we  show  the  advantages  for  our  algorithm  of  a  squared  (2  over 
an  £]  measure  of  constraint  violations.  Third  we  show  that  our  main  convergence  result  con¬ 
cerning  the  objective  function  cannot  be  improved  without  removing  some  of  the  flexibility 
of  the  algorithm  or  adding  more  assumptions  about  the  problem.  We  conclude  this  section  by 
applying  our  method  to  a  real  engineering  problem.  These  examples  show  the  strength  and 
limitations  of  our  approach.  The  paper  concludes  by  a  discussion  of  the  significance  of  the 
convergence  results. 

M.  Abramson’s  MatLab  6  implementation  NOMADm  of  a  suite  of  algorithms  including 
the  filter  algorithm  given  here  is  at  http://www.caam.rice.edu/~abramson/NOMADm.html. 
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2.  Pattern  search  algorithms  for  unconstrained  optimization.  The  reader  is  referred 
to  [22,  2]  for  a  thorough  description  of  linearly  constrained  pattern  search  algorithms.  In  the 
present  document,  the  same  notation  as  in  [2]  is  used. 

2.1.  Search  and  poll  steps.  Pattern  search  algorithms  for  unconstrained  minimization 
generate  a  sequence  of  iterates  {x*J  in  R"  with  non-increasing  objective  function  values.  At 
each  iteration,  the  objective  function  is  evaluated  at  a  finite  number  of  points  on  a  mesh  (a 
discrete  subset  of  R"  defined  below)  to  try  to  find  one  that  yields  a  lower  objective  function 
value  than  the  incumbent.  An  incumbent  solution  is  a  trial  point  in  R"  where  the  algorithm 
evaluated  the  objective  function  /  and  has  the  lowest  value  found  so  far.  Any  strategy  may 
be  used  to  select  mesh  points  that  are  to  be  candidates  for  the  next  iteration,  provided  that 
only  a  finite  number  of  points  (possibly  none)  are  selected.  This  is  called  the  SEARCH  step. 
We  favor  SEARCH  procedures  that  choose  candidate  points  independent  of  the  incumbent  and 
whose  global  reach  is  independent  of  the  mesh  size.  We  feel  that  such  SEARCH  procedures 
are  more  likely  to  discover  different  basins  for  the  function  than  the  one  in  which  the  initial 
point  lies. 

When  the  SEARCH  fails  in  finding  an  improved  mesh  point,  the  POLL  step  must  be  in¬ 
voked.  In  that  “fall  back”  step  the  function  value  is  evaluated  at  neighboring  mesh  points 
around  xk .  If  the  POLL  step  also  fails  in  finding  an  improved  mesh  point,  then  xk  is  said  to  be 
a  mesh  local  optimizer.  The  mesh  is  then  refined  and  xk+\  is  set  to  xk.  The  situation  for  our 
constrained  version  is  going  to  be  a  bit  more  complex,  but  is  consistent  in  spirit. 

If  either  the  SEARCH  or  POLL  step  succeeds  in  finding  an  improved  mesh  point  xk+i  /  xk 
with  a  strictly  lower  objective  function  value,  then  the  mesh  size  parameter  is  kept  the  same 
or  increased,  and  the  process  is  reiterated.  Indeed,  as  long  as  improved  mesh  points  are 
found,  one  would  likely  choose  trial  points  on  coarser  meshes.  With  surrogate-based  SEARCH 
steps  [4],  a  great  deal  of  progress  can  often  be  made  with  few  function  values,  and  0(n) 
function  values  are  needed  only  when  the  POLL  step  detects  a  mesh  local  optimizer,  which 
indicates  that  the  mesh  needs  to  be  refined.  We  warn  the  reader  that  there  is  only  a  cursory 
discussion  of  SEARCH  strategies  in  the  present  paper.  The  reason  is  that  since  the  SEARCH 
is  free  of  any  rule,  except  finiteness  and  being  on  the  mesh,  it  cannot  be  used  to  enhance 
the  convergence  theory.  Indeed,  some  examples  in  [1]  exploit  perverse  SEARCH  strategies  to 
show  negative  results.  However,  we  are  willing  to  pay  this  “theoretical”  price  for  the  practical 
reasons  given  above. 

The  formal  definition  of  the  mesh  requires  the  following.  Let  D  be  a  finite  matrix  whose 
columns  in  R"  form  a  positive  spanning  set.  We  use  the  notation  d  £  D  to  indicate  that  d 
is  a  column  of  the  matrix  D.  It  is  also  required  that  each  column  d  £  D  is  the  product  of  a 
non-singular  generating  matrix  G  £  Wl/"  by  some  integer  vector  in  Z".  The  same  generating 
matrix  G  is  used  for  all  directions  d.  See  [1,2]  for  further  insight  on  this  set  Z),  or  see  [26]  for 
the  original  equivalent  formulation.  The  set  valued  function  defines  the  current  mesh 

through  the  lattices  spanned  by  the  columns  of  Z),  centered  around  the  current  iterate  xk: 

(2.1)  M{xk,  Ak)  =  {xk  +  A kDz :  z  £  N"c} , 

where  A*  £  R+  is  the  mesh  size  parameter,  and  njj  is  the  number  of  columns  of  the  matrix  D. 
Note  that  in  Section  4.2  on  the  filter  GPS  algorithm,  we  will  use  a  more  general  definition  of 
the  mesh. 

When  the  SEARCH  fails  in  providing  an  improved  mesh  point,  the  objective  function  must 
be  evaluated  at  the  mesh  points  that  neighbor  the  current  iterate  xk,  the  current  incumbent 
solution.  In  the  unconstrained  case  this  poll  center  is  xk,  the  current  iterate.  This  defines 
the  poll  set  Pk  =  {xk }  U  {xk  +  Akd  :  d  £  Dk}  for  some  positive  spanning  matrix  Dk  C  D.  This 
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notation  means  that  the  columns  of  Dk  are  chosen  front  those  of  D.  We  will  refer  to  evaluating 
f(xk  +  Akd)  as  polling  in  the  direction  d. 

In  the  filter  algorithm  presented  in  Section  4,  we  may  not  poll  about  the  current  iterate, 
but  instead  we  will  poll  about  some  possibly  different  poll  center  with  a  different  definition 
for  the  current  mesh. 

2.2.  Parameter  update.  At  any  iteration,  there  are  two  possible  outcomes,  which  lead 
to  two  sets  of  rules  to  update  the  parameters. 

If  the  iteration  fails  to  produce  an  improved  mesh  point,  then  the  POLL  step  guarantees 
that  Xk  is  a  mesh  local  optimizer.  The  mesh  is  then  refined.  More  precisely, 

(2.2)  A*+i=Tw*At 

with  0  <  xM'A'  <  1,  where  x  >  1  is  a  rational  number  that  remains  constant  over  all  iterations, 
and  W) t  <  —  1  is  an  integer  bounded  below  by  the  constant  w~  <  —  1 . 

If  the  iteration  produces  an  improved  mesh  point,  then  the  mesh  size  parameter  is  kept 
the  same  or  is  increased,  and  the  process  is  reiterated.  The  coarsening  of  the  mesh  follows 
the  rule 

(2.3)  A*+1=xw*At 

where  x  >  1  is  defined  above  and  wk  >  0  is  an  integer  bounded  above  by  w  >  0.  By  modi¬ 
fying  the  mesh  size  parameters  this  way,  it  follows  that  for  any  k  >  0,  there  exists  an  integer 
r*  G  Z  such  that  A*.  =  xr*A<). 

Typical  values  for  the  mesh  parameter  update  are  x  =  2  and  wk  =  —  \  when  the  poll  center 
is  shown  to  be  a  local  mesh  optimizer,  and  wk  =  1  when  an  improved  mesh  point  is  found. 
This  leads  to  setting  A^+i  =  ( A/  when  the  mesh  needs  to  be  refined,  and  A^+i  =  2 A/,  when 
the  mesh  is  coarsened.  An  example  of  the  direction  matrix  might  be  D  —  [/„  —  /„],  where  In 
is  the  n  x  n  identity  matrix.  The  mesh  would  then  be  M(xk,  A*)  =  {xk  +  Akz  :  z  G  Z"}  and  the 
POLL  set  would  be  Pk  =  {jc>}  U  { x k  ±  Akei :  i  =  1,2, , . . ,«},  where  e;  is  the  zth  column  of  the 
identity  matrix.  In  the  case  where  D  is  constructed  from  all  the  columns  of  the  set  {—1,0, 1}", 
the  mesh  is  the  same  as  the  previous  one,  but  the  POLL  set  may  differ.  In  ]R2  for  instance  it 
could  be  Pk  =  {xk,xk  +  Ak(\,0)T  ,xk  +  Ak(0,\)T  ,xk  +  Ak(-l,-\)T}. 

We  borrow  Coope  and  Price’s  [10]  terminology  for  the  following  final  remark  on  GPS. 
These  methods  are  said  to  be  opportunistic  in  the  sense  that  as  soon  as  an  improved  mesh 
point  is  found,  the  current  iteration  may  stop  without  completing  the  function  evaluations  in 
the  SEARCH  and  POLL  steps. 

3.  Filter  algorithms  for  constrained  optimization.  Filter  algorithms  treat  the  opti¬ 
mization  problem  as  biobjective:  one  wishes  to  minimize  both  the  objective  function  /  and  a 
nonnegative  aggregate  constraint  violation  function  h.  Filter  algorithms  attempt  to  minimize 
both  functions,  but  clearly  priority  must  be  given  to  h,  at  least  until  a  feasible  iterate  is  found. 
This  priority  appears  also  in  our  algorithm  in  the  definition  of  the  poll  centers  and  the  poll  set. 
Fletcher  et  al.  [14,  15,  16]  do  this  via  restoration  steps.  Another  difference  is  that  in  keep¬ 
ing  with  pattern  search  algorithms  for  less  general  problems,  we  require  only  improvement 
in  either  /  or  /;,  while  Fletcher  et  al.  have  a  sufficient  decrease  condition  in  the  form  of  an 
envelope  over  the  filter  that  constitutes  a  “sufficiently  unfiltered”  condition. 

The  terminology  used  in  this  paper  differs  slightly  from  that  used  by  Fletcher  et  al.  Our 
notation  is  more  compact  for  our  class  of  algorithms,  and  so  it  simplifies  the  presentation  of 
our  results.  In  addition,  since  our  plan  is  to  provide  a  truly  multiobjective  GPS  algorithm 
in  later  work,  and  since  it  is  likely  to  involve  a  version  of  the  filter,  it  is  best  to  conform  to 
standard  terminology  in  multiobjective  optimization.  [13] 
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Fletcher  et  al.  ’s  definition  of  dominance  makes  it  a  reflexive  relation,  which  simplifies  the 
definition  of  a  filter,  but  we  will  forgo  that  convenience  to  adhere  to  standard  multiobjective 
terminology.  The  point  is  that  the  reader  familiar  with  the  filter  literature  should  read  this 
section  carefully.  We  will  end  up  with  almost  the  standard  notion  of  a  filter,  but  we  will  define 
it  slightly  differently  using  the  standard  multiobjective  notion  of  dominance:  For  a  pair  of 
vectors  w,  w',  with  finite  components,  w  dominates  w'  ,  written  vv  £  w'  ,  if  and  only  if  < 
vt/  ,  and  w  ^  w' .  We  will  use  w  <w'  to  indicate  that  either  w  £  w' ,  or  that  w  =  vv/,  which  is 
the  notion  of  dominance  used  in  earlier  filter  papers. 

The  constraint  violation  function  is  defined  to  satisfy  the  following  properties:  h(x)  >  0, 
h(x)  =  0  if  and  only  if  x  is  feasible,  thus  h{x)  >  0  if  and  only  if  x  is  infeasible,  and  h(x)  =  +°° 
whenever  any  component  of  C(x)  is  infinite.  For  example,  we  could  set  h(x)=  ||C(x)  + 1|  where 
||  •  ||  is  a  vector  norm  and  where  (C(x)+);-  is  set  to  zero  if  c,(x)  <  0  and  to  c,(x)  otherwise, 
i  =  1,2 We  show  in  Section  6.1  that  the  more  locally  smooth  h  is,  the  better  the 
algorithm  is  able  to  exploit  the  positive  spanning  sets  used.  Our  analysis  and  the  examples  in 
Sections  5.2  and  7.2  indicate  that  h{x)  =  ||C(x)  +  ||i  is  a  sound  choice. 

Recall  that  the  feasible  region  of  the  optimization  problem  (1.1)  is  defined  to  be  the 
intersection  of  a  polyhedron  X  and  £1.  Since  it  is  simple  to  remain  feasible  with  respect  to 
X,  we  define  a  second  constraint  violation  function  lix  =  h  4  \|/x ,  where  is  the  indicator 
function  for  X.  It  is  zero  on  X  and  +°°  elsewhere.  We  will  see  in  the  next  section  that 
by  applying  our  pattern  search  filter  algorithm  to  hx  and  /,  the  convergence  results  with 
respect  to  feasibility  will  depend  on  local  smoothness  of  /;,  and  not  of  hx ,  which  is  obviously 
discontinuous  on  the  boundary  of  X. 

There  should  be  no  confusion  in  defining  a  special  meaning  of  dominance  for  the  vector 
arguments  of  our  problem  functions  hx ,  /.  This  will  simplify  our  terminology  rather  than  to 
use  some  other  symbol  such  as  -<{hx,f)-  Thus,  a  point  £  M"  is  said  to  dominate  x  £  R”, 
Xk  £  x  if  and  only  if  (hx(xk),f(xk))T  -5  (hx{x),f(x))T.  Two  points  are  equivalent  if  they 
generate  an  identical  pair  of  hx  and  /  values.  As  above,  x  A  x'  indicates  that  either  x  £  x\  or 
x  and  x'  are  equivalent. 

A  filter  is  a  finite  set  of  infeasible  points  in  R"  such  that  no  pair  x,x'  in  the  filter  are 
in  the  relation  x  £  x' .  A  point  x'  is  said  to  be  filtered  if  either  x'  A  x  for  some  x  £  J- ,  or  if 
hx  (xr)  >  hfnax  for  some  positive  finite  upper  bound  h,nax  on  allowable  aggregate  infeasibility, 
or  if  x  is  feasible  and  f(x)  >  fF  (i.e.,  the  least  function  value  found  so  far  at  a  feasible  point). 
The  point  x  is  unfiltered  otherwise.  The  set  of  filtered  points  ‘J  is  denoted  in  standard  notation 
as: 

(3.1)  5  =  U  {x'  '■  x'  -  u  W  '■  hx(x)  >  hmax}  U  {x  :  hx{x')  =  0  ,f(x')  >  fF}. 

xetF 

Observe  that  our  notation  implies  that  trial  points  tying  an  incumbent’s  /  and  h  values  are 
filtered,  and  thus  are  not  improved  mesh  points  However,  they  will  be  poll  center  candidates. 

Unfiltered  points  are  added  to  as  they  are  generated,  and  filtered  ones  are  rejected. 
Whether  a  point  is  filtered  can  depend  on  when  it  is  generated.  This  temporal  property  causes 
“blocking  entries”  [14],  In  order  to  avoid  the  problem  of  blocking  entries,  the  filter  contains 
only  infeasible  points.  The  incumbent  best  feasible  point  is  treated  separately  as  a  sort  of 
single  point  filter  that  can  only  filter  other  feasible  points.  The  reason  for  this  separation  is 
to  encourage  moving  over  an  infeasible  function  ridge  and  approaching  a  different  part  of  the 
feasible  region. 
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4.  A  pattern  search  filter  algorithm  for  constrained  optimization.  In  the  previous 
sections  we  presented  the  filter  framework  for  general  constraints,  and  the  GPS  algorithm 
for  unconstrained  optimization.  We  now  present  a  GPS  filter  method  for  the  optimization 
problem  (1.1).  When  some  of  the  constraints  are  known  to  be  linear,  i.e.,  when  X  is  not 
trivial,  it  is  frequently  advantageous  to  treat  them  separately  from  the  others  and  to  ask  that 
every  iterate  belongs  to  X.  This  is  especially  true  of  linear  equality  or  bound  constraints.  For 
any  derivative-free  algorithm,  one  should  surely  use  linear  equality  constraints  to  eliminate 
variables,  and  this  is  desirable  for  nonlinear  equality  constraints  if  it  is  practical. 

4.1.  Bound  and  linear  constraints.  By  applying  the  algorithm  to  hx  instead  of  h,  any 
trial  point  outside  of  X  is  rejected  since  its  constraint  violation  function  value  is  larger  than 
h,nax ■  We  called  this  the  “barrier”  approach  for  A.  In  the  absence  of  general  constraints  [2], 
the  indicator  function  is  added  to  /  instead  of  h.  In  the  present  work,  we  cannot  add  it  to  / 
since  the  trial  points  xGX  for  which  —  °°  <  f(x)  <  °°  and  0  <  h(x)  <  h(x')  for  all  x'  G  ¥  are 
unfiltered. 

In  addition,  the  fact  that  some  linear  constraints  are  explicitly  known  must  be  used  to 
select  mesh  directions  that  take  into  account  the  geometry  of  the  region  X,  just  as  suggested 
in  [20,  22,  2]:  When  the  poll  center  is  within  a  given  tolerance  e  >  0  of  the  boundary  of  X, 
then  the  positive  spanning  directions  /!/  that  define  the  poll  set  are  chosen  to  contain  the  ones 
that  span  the  tangent  cone  Tx  (y)  to  X  at  all  boundary  points  y  within  the  tolerance  e.  The 
formal  definition  is  as  follows: 

DEFINITION  4.1.  A  rule  for  selecting  the  positive  spanning  sets  D ^  =  D(k,xf)  C  D 
conforms  to  X  for  some  e  >  0,  if  at  each  iteration  k  and  for  each  y  in  the  boundary  ofX  for 
which  Hy-XjtH  <  e,  Tx(y)  is  generated  by  a  nonnegative  linear  combinations  of  the  columns 
of  a  subset  D\  ofDi. 

These  tangent  cone  directions  should  be  added  to  D a  before  getting  too  close  to  the 
boundary,  i.e .,  it  is  best  in  our  experience  not  to  take  the  tolerance  e  too  small. 

4.2.  Meshes  and  poll  centers.  In  our  proposed  pattern  search  algorithm,  the  test  for 
accepting  an  improved  mesh  point  is  not  based  solely  on  the  decrease  of  the  objective  function 
value  when  there  are  constraints.  Therefore,  the  terminology  improved  mesh  point  (used 
in  the  unconstrained  case)  is  not  suited  in  a  biobjective  context.  Instead,  we  will  use  the 
terminology  unfiltered  mesh  point  when  either  the  SEARCH  or  POLL  step  finds  a  mesh  point 
that  is  not  filtered.  If  both  steps  fail  in  finding  an  unfiltered  mesh  point,  then  we  cannot  say 
that  the  poll  center  is  a  mesh  local  optimizer  (as  in  the  unconstrained  case),  instead  we  will 
say  that  the  poll  center,  which  will  be  chosen  to  be  one  of  two  special  points  in  the  filter  or 
else  a  point  that  ties  one  of  them,  is  a  mesh  isolated  filter  point  since  its  mesh  neighbors  (the 
points  in  the  poll  set)  are  all  filtered. 

As  in  the  pattern  search  algorithms  presented  in  Section  2,  the  SEARCH  and  POLL  steps 
are  opportunistic,  and  may  be  terminated  without  any  more  function  evaluations  when  an 
unfiltered  mesh  point  is  found.  The  mesh  size  parameter  is  then  either  increased  or  kept 
constant  according  to  rule  (2.3).  When  no  such  point  is  found,  the  poll  center  is  a  mesh 
isolated  filter  point  and  the  filter  remains  unmodified.  The  mesh  size  parameter  is  decreased 
according  to  rule  (2.2).  Unlike  Fletcher  et  aids  filter  algorithms,  there  is  no  “envelope”  added 
to  the  filter  to  guarantee  a  form  of  sufficient  decrease. 

We  define  two  types  of  incumbents:  the  feasible  ones,  and  the  infeasible  ones  with  min¬ 
imal  constraint  violation.  Let  fk  represent  the  feasible  incumbent  value,  i.e.,  the  smallest 
objective  function  value  (for  feasible  points)  found  by  the  algorithm  up  to  iteration  k  If  no 
feasible  point  has  been  found,  fk  is  set  at  Let  h[  >  0  be  the  least  positive  constraint  vio¬ 
lation  function  value  found  up  to  iteration  k,  and  let  f[.  denote  the  smallest  objective  function 
value  of  the  points  found  whose  constraint  violation  function  values  are  equal  to  /;[.  If  no 
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such  point  exists,  or  if  /;[  >  hmax  then  hi  is  fixed  at  hmax  and  f!  at  — The  superscript  F 
stands  for  feasible  and  I  for  infeasible.  Figure  4.1  shows  an  example  of  a  filter.  Any  x  £ 
found  by  the  algorithm,  is  a  feasible  incumbent  point  if  fix)  =  fF,  and  it  is  an  infeasible 
incumbent  point  if  h(x)  =  hi  and  f(x)  =  //. 


/ 


fF 

h 


7k 


Feasible  region: 

Trial  point: 

Filtered  trial  point: 
Unfiltered  mesh  point: 
Mesh  isolated  filter  point: 


hx 


=  hx(x)  =  0} 

Ik  G  4/  (xk ,  A* ) 

tk  G  7k 
tk&Tk 

Pk  c  7k 


Fig.  4.1.  Feasible  incumbent  set:  |.r  E  It  X  Sk  :  fix)  =  fF}.  Infeasible  incumbent  set  {.teXOSt:  h(x)  = 
**>  /(*)=/*}• 

Our  filter  algorithm  does  not  have  a  sufficient  decrease  condition.  Any  unfiltered  mesh 
point  (i.e.,  for  which  x  f  7  -  see  equation  (3.1))  is  a  candidate  for  being  a  new  iterate.  Our 
convergence  results  are  for  the  iterates  that  are  incumbents.  Still,  iterate  is  a  useful  term  in  an 
implementation  because  a  new  iterate  means  a  new  unfiltered  mesh  point  has  been  found  and 
the  user  can  continue  to  search  on  the  current  mesh  as  in  [3]  without  resorting  to  a  POLL  step. 

In  the  unconstrained  case,  there  was  a  single  type  of  incumbent  solution.  Therefore  the 
mesh  was  necessarily  constructed  around  it.  We  redefine  the  current  mesh  so  that  it  contains 
more  points,  therefore  allowing  more  flexibility  to  the  algorithm.  Recall  that  the  mesh  is 
conceptual  in  the  sense  that  it  is  never  actually  constructed,  and  its  sole  purpose  is  to  allow 
us  to  capture  some  structure  about  the  trial  points  so  that  we  can  derive  some  convergence 
results.  Let  So  be  the  set  of  initial  trial  points  provided  by  the  user  at  which  the  function 
values  are  computed.  We  assume  that  at  least  one  point  of  So  has  a  constraint  violation 
function  value  less  than  hmax  and  that  every  element  of  So  lies  on  Mo(x,Ao)  (see  (2.1))  for 
some  x  £  So-  Define  5'/,  to  be  the  set  of  points  where  the  functions  were  evaluated  by  the  start 
of  iteration  k.  The  mesh  is  now  defined  as  the  following  union: 

(4.1)  M(Sk,Ak)  =  U  *)  , 

xeSk 

where  M(x,Ak)  is  defined  in  equation  (2.1).  This  new  definition  allows  more  flexibility  to  the 
user.  For  example,  the  SEARCH  step  is  now  allowed  to  poll  around  any  trial  point  x  in  Sk,  i.e., 
to  evaluate  the  functions  at  points  from  the  set  {x  +  Akd  :  d  £  D}. 

The  poll  center  pk ,  the  point  around  which  the  poll  set  is  constructed,  is  chosen  either 
in  the  set  of  feasible  incumbents  or  to  be  one  of  the  infeasible  incumbents  with  minimal 
constraint  violation.  Note  that  when  these  sets  of  incumbents  are  non-empty,  they  each  will 
usually  be  composed  of  a  single  element.  Thus,  the  poll  center  either  satisfies 

(4.2)  ( hx(Pk),f(Pk ))  =  (0 ,fk)  or  ( hx(pk),f(pk ))  =  • 

The  poll  set  is  the  poll  center  pk  together  with  its  mesh  neighbors: 

(4.3)  Pk  =  {pk}  U  {pk  +  A kd  :  d  £  Dk}  . 
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The  positive  spanning  matrix  D *  is  composed  of  columns  of  D  and  conforms  to  the  boundary 
of  the  linear  constraints  for  an  e  >  0  (see  Definition  4.1).  The  current  iterate  Xk  is  typically 
used  to  seed  the  SEARCH  step,  and  the  poll  center  is  always  used  for  the  POLL  step. 

Our  class  of  algorithms  and  their  analysis  are  completely  flexible  about  the  choice  be¬ 
tween  these  two  sets  of  poll  centers.  The  user  may  supply  a  strategy  to  select  a  poll  center.  In 
Section  7,  we  give  what  we  hope  are  convincing  arguments  not  to  always  make  one  choice  or 
the  other.  Indeed,  always  choosing  a  stationary  feasible  poll  center  makes  the  filter  algorithm 
essentially  reduce  to  the  barrier  approach,  which  is  not  indicated  for  constraints  where  the 
polling  directions  do  not  conform  to  the  feasible  region.  Still,  we  prefer  neither  to  prescribe 
nor  to  proscribe  any  choice  rule  for  poll  centers.  This  flexibility  may  seem  tedious  to  the 
reader,  but  the  user  may  have  a  clear  preference  based  on  the  results  of  the  SEARCH,  which 
may  have  involved  some  strategy  that  makes  it  unlikely  that  one  or  the  other  choice  would 
be  successful.  For  example,  surrogate  polling  around  several  filter  points  in  the  SEARCH  step 
seems  promising  in  early  tests,  and  the  results  would  be  likely  to  influence  the  choice  of  poll 
center. 

Even  if  we  already  have  a  feasible  incumbent  point,  we  may  wish  to  poll  around  one  of 
the  least  infeasible  points,  which  might  have  a  lower  objective  function  value,  in  order  to  try 
to  find  and  explore  a  different  part  of  the  feasible  region  O.  Also,  this  is  what  allows  our 
filter  algorithm  to  avoid  stalling  in  the  Lewis  and  Torczon  [20]  example  when  those  linear 
constraints  are  treated  by  the  filter.  This  is  illustrated  in  Section  7.1. 

4.3.  Description  of  the  algorithm.  At  any  iteration,  three  types  of  unfiltered  mesh 
points  Xk+\  £  M(S, t,  At)  can  be  generated  by  the  algorithm:  The  most  useful  ones  are  the  unfil¬ 
tered  feasible  mesh  points.  They  improve  the  feasible  incumbent  value  to  f[+l  =  f(xk+ 1)  < 
ff  .  Next  are  the  infeasible  ones  that  improve  the  infeasible  incumbent  with  minimal  con¬ 
straint  violation:  0  <  hIk+l  =  hx{xk+ 1)  <  hk  and  fk+l  =  /(x*+i ).  Finally  there  are  the  other 
infeasible  ones  that  add  some  elements  to  the  filter,  but  leave  the  incumbents  unchanged.  In 
all  three  cases,  the  mesh  size  parameter  is  updated  according  to  rule  (2.3),  with  possibly  some 
different  values  of  vvj.. 

To  check  if  a  trial  point  x  is  filtered  or  not,  the  following  strategy  is  used  in  order  to  avoid 
wasting  expensive  function  evaluations  of  /  and  C.  First  \|/x  (x)  is  evaluated  by  determining 
if  x  belongs  to  X.  If  not,  then  x  is  filtered  and  the  evaluation  of  f(x)  and  C(x)  is  avoided. 
Second,  it  is  possible  that  partial  information  on  C(x)  allows  the  algorithm  to  conclude  that 
x  is  filtered.  For  example,  if  h(x)  =  ||C(x)  +  ||o  and  if  it  known  that  Lf=  i  k/W+|2  >  hmax 
for  some  index  p  <  m,  then  the  evaluation  of  f(x')  and  c,(x)  for  i  =  p  +  l,p  +  2, . . .  ,m  is 
not  necessary.  Similar  observations  hold  if  f(x')  and  partial  information  on  C(x)  are  known, 
though  this  situation  is  more  complicated  since  the  value  of  f(x)  alone  can  not  allow  us  to 
conclude  that  x  is  filtered  without  at  least  partial  knowledge  of  h(x). 

When  all  trial  points  are  filtered,  then  the  poll  center  pi-  is  a  mesh  isolated  filter  point. 
Regardless  of  the  feasibility  of  x*,,  the  next  iterate  x^+ 1  is  set  to  xj,,  and  the  mesh  size  parameter 
is  decreased  according  to  rule  (2.2).  The  next  poll  center  p^+i  need  not  be  fixed  to  p/c.  These 
iterations  usually  require  more  function  evaluations  than  when  an  unfiltered  mesh  point  is 
found.  A  sensible  strategy  is  to  poll  around  both  incumbents  before  decreasing  the  mesh 
size  parameter.  Logically,  one  can  declare  that  the  first  POLL  step  was  actually  a  part  of  the 
SEARCH. 

A  typical  way  in  updating  the  mesh  size  parameter  is  to  double  it  when  a  new  incumbent 
solution  is  found,  otherwise  to  keep  it  constant  when  only  an  unfiltered  mesh  point  is  found, 
and  to  cut  it  in  half  when  the  poll  center  is  shown  to  be  a  mesh  isolated  filter  point. 

Our  algorithm  for  constrained  optimization  is  formally  stated  in  figure  4.2.  We  allow 
for  the  fact  that  in  some  applications,  a  set  So  of  initial  points  may  be  available  from  solving 
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similar  problems,  and  can  be  used  to  seed  the  filter.  Without  any  loss  of  generality  we  assume 
that  any  such  points,  or  at  least  the  undominated  ones,  are  on  the  initial  mesh  and  that  they 
have  been  “filtered”  to  be  consistent  with  our  initialization  step  in  the  sense  that  xq  will  not  be 
filtered  by  the  other  seed  points.  An  easy  way  to  assure  this  would  be  to  take  xq  to  be  the  seed 
point  with  the  smallest  value  of  hx,  to  break  ties  by  taking  one  with  the  smallest  objective 
function  value,  and  to  make  sure  that  the  necessary  directions  are  in  D  in  order  that  all  the 
initial  filter  points  are  on  the  mesh.  Of  course,  one  must  ensure  that  the  directions  satisfy  the 
conditions  of  Section  2. 


A  pattern  search  filter  algorithm 

•  Initialization:  Let  Jo  be  the  filter  associated  with  a  set  of  initial  points  So-  Let 
x'o  be  an  undominated  point  of  Jo-  Fix  Ao  >  0  and  set  the  iteration  counter  k  to  0. 

•  Definition  of  incumbent  points:  Define  (if  possible) 

f!' :  the  least  objective  function  value  for  all  feasible  points  found  so  far; 
h[  >  0:  the  least  positive  constraint  violation  function  value  found  so  far; 

//:  the  least  objective  function  value  of  the  points  found  so  far  whose  constraint 
violation  function  value  is  equal  to  h'k. 

•  Search  and  poll  on  current  mesh  M(Sk,Ak)  (see  equation  (4.1)):  Perform 
the  SEARCH  and  possibly  the  POLL  steps  (or  only  part  of  the  steps)  until  an  unfil¬ 
tered  trial  point  xk.\  i  is  found,  or  when  it  is  shown  that  all  trial  points  are  filtered 

by  Jk- 

-  Search  step:  Evaluate  hx  and  /  on  a  set  of  trial  points  on  the  current  mesh 
M(SklAk)  (the  strategy  that  gives  the  set  of  points  is  usually  provided  by  the 
user). 

-  Poll  step:  Evaluate  hx  and  /  on  the  poll  sets  Pk  (see  equation  (4.3)),  for  a 
poll  center  pk  that  satisfy  equation  (4.2). 

•  Parameter  update:  Let  Sk  \ 1  =  Sk  U  {  the  set  of  all  trial  points  visited  in  the 
SEARCH  and  POLL  steps  }.  If  the  SEARCH  or  the  POLL  step  produced  an  unfiltered 
mesh  point  xk+\  ^  Jk,  then  update  Ak+\  >  Ak  according  to  rule  (2.3),  then  update 
the  filter  at  the  next  step. 

Otherwise,  set  xk+\  =  xk,  update  Aj+i  <  Ak  according  to  rule  (2.2),  and  set  Jk+\  = 
Jk',  increase  k  <—  k+  1  and  go  back  the  definition  of  the  incumbents. 

•  Filter  update:  Let  Jk+\  be  the  union  of  Jk  with  all  infeasible  unfiltered  points 
(with  respect  to  Jk)  found  during  the  SEARCH  and  POLL  step.  Remove  domi¬ 
nated  points  from  Jk+i-  Increase  k  <—  k+  1  and  go  back  to  the  definition  of  the 
incumbents. 


FIG.  4.2.  A  pattern  search  filter  algorithm 


In  pattern  search  algorithms,  one  role  of  the  POLL  step  is  to  guarantee  convergence.  This 
is  why  it  is  rigidly  defined  through  the  positive  spanning  sets  Dk  C  D.  In  practice,  the  largest 
improvements  in  the  incumbent  points  are  obtained  in  the  SEARCH  step  (e.g.,  see  [3,  4,  5] 
where  a  surrogate  of  an  expensive  function  is  constructed).  The  SEARCH  step  is  usually  the 
one  that  drives  the  iterates  away  from  a  local  optimum.  In  a  SEARCH  implementation,  it 
might  be  a  good  idea  to  try  some  points  that  are  near  points  of  the  filter.  Paul  Frank  made  the 
interesting  suggestion  that  SEARCH  might  include  polling  around  the  next  most  feasible  filter 
point,  i.e.,  x  £  Jk  with  the  least  value  of  hx(x)  >  h1 .  The  objective  here  again  is  to  attempt  to 
find  and  then  explore  a  different  part  of  the  feasible  region.  This  is  illustrated  by  the  example 
in  Section  7.4. 

In  the  next  section,  we  discuss  the  reduction  of  the  algorithm  proposed  here  in  the  ab- 
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sence  of  nonlinear  constraints  to  those  given  earlier  for  unconstrained  and  linearly  constrained 
problems. 

4.4.  Reduction  of  the  GPS-filter  method  to  linearly  constrained  optimization.  Con¬ 
sider  the  case  where  m  =  0,  i.e.,  when  there  are  no  nonlinear  constraints.  In  [2],  linear  con¬ 
straints  defining  X  were  handled  by  adding  the  indicator  function  to  /  and  in  the  present 
paper,  it  is  added  to  h.  The  effect  is  the  same,  since  in  both  cases  the  indicator  function  sim¬ 
ply  eliminates  from  consideration  the  infeasible  points  with  respect  to  X.  In  both  cases,  the 
convergence  results  are  relative  to  the  smoothness  of  h  and  /  and  not  of  lix  and  fx. 

The  main  result  of  [2]  was  that  for  the  limit  x  of  a  refining  subsequence  {pkjkeK,  Clarke 
derivatives  are  nonnegative  in  all  the  unsuccessful  polling  directions  used  infinitely  often  in 
the  subsequence.  The  analysis  below  generalizes  this  by  identifying  a  large  set  of  directions 
for  which  Clarke  derivatives  are  nonnegative. 

4.5.  Infinite  refinement  of  the  mesh.  In  this  section,  we  identify  a  set  of  limit  points 
of  the  algorithm  that  exists  even  without  assuming  any  problem  smoothness.  Later  we  will 
show  some  optimality  conditions  hold  at  these  limit  points  for  which  the  problem  is  locally 
smooth. 

The  convergence  analysis  of  our  algorithm  is  based  on  the  standard  (see  [9,  14,  15,  16]) 
assumption  that  all  trial  points  produced  by  the  algorithm  lie  in  a  compact  set.  A  consequence 
of  this  is  that  since  the  mesh  size  parameter  does  not  decrease  when  an  unfiltered  mesh  point 
is  found  (A*_|_i  >  Ak),  then  it  follows  that  only  finitely  many  consecutive  unfiltered  mesh 
points  can  be  generated. 

We  will  be  mainly  concerned  with  the  poll  centers  p k  that  are  mesh  isolated  filter  points 
(i.e.,  the  mesh  neighbors  of  pk  are  filtered)  and  for  which  the  mesh  size  parameter  is  reduced 
(Ajt+i  <  A/J.  The  proofs  of  the  results  in  this  subsections  are  omitted,  even  if  the  definition 
of  the  mesh  is  slightly  different.  The  key  element  required  is  not  the  mesh,  but  the  fact  that 
any  mesh  point  x  £  M(Sk,  A*)  can  be  written  as  x  +  £f=0  A ,-Dzi  for  some  x  £  So  and  z,j  £  N"D, 
for  i  =  0, 1, . . .  ,k. 

Our  first  result  is  that  there  is  a  subsequence  of  iterations  for  which  the  mesh  size  pa¬ 
rameter  goes  to  zero.  In  order  to  prove  it  we  require  the  following  lemma  from  Torczon  [26] 
or  Audet  and  Dennis  [2]  whose  proof  can  be  easily  modified  with  our  definition  (4.1)  of  the 
mesh. 

LEMMA  4.2.  The  mesh  size  parameters  Ak  are  bounded  above  by  a  positive  constant 
independent  of  the  iteration  number  k. 

Combining  this  lemma  with  the  assumption  that  all  iterates  lie  in  a  compact  set  implies 
the  following  result.  Its  proof  is  omitted  since  it  is  identical  to  that  of  the  same  result  in  [2], 
The  original  proof  of  this,  using  slightly  different  notation  can  be  found  in  Torczon  [26]. 

LEMMA  4.3.  The  mesh  size  parameters  satisfy  lim  inf  A*  =  0. 

k— >+oo 

Coope  and  Price  [10]  analyze  mesh-based  algorithms  for  the  unconstrained  and  linearly 
constrained  problems  in  which  instead  of  requiring  that  the  SEARCH  be  performed  on  the 
mesh,  they  assume  that  the  limit  inferior  of  the  mesh  size  parameter  goes  to  zero.  This  shifts 
the  burden  from  the  algorithm  specification  to  the  implementation. 

Since  the  mesh  size  parameter  shrinks  only  at  mesh  isolated  filter  points.  Lemma  4.3 
guarantees  that  there  are  infinitely  many  iterations  for  which  the  poll  centers  are  mesh  isolated 
filter  points.  Thus  by  compactness,  the  mesh  isolated  filter  points  have  limit  points.  Moreover, 
all  these  limit  points  belong  to  the  polyhedron  X.  At  such  an  iteration  the  entire  trial  set,  and 
in  particular  the  poll  set  l\,  is  filtered.  Therefore,  for  each  direction  d  £  Dk  either  hx(pk  + 
A kd)  >  hmax,  or  there  exists  some  element  x  in  the  filter  Jk  such  that  both  f(pk  +  Akd)  >  f(x) 
and  hx(pk+Akd)  >  hx (x)  or,  hx(pk  +  Akd)  =  0  and  fx(pk  +  Akd)  >  f[. 
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5.  Background  for  optimality  results.  As  in  [2],  Clarke’s  [8]  generalized  derivatives 
are  the  key  to  our  convergence  analysis.  To  use  this  powerful  tool,  we  analyze  the  case  where 
the  function  is  Lipschitz  in  a  neighborhood  of  the  limit  point  in  question.  Of  course,  there 
are  some  optimization  problems  on  which  we  would  apply  our  algorithm  where  the  functions 
are  not  Lipschitz,  or  optimization  problems  where  we  cannot  show  that  the  functions  are 
Lipschitz.  But  this  is  beside  the  point.  We  show  how  the  algorithm  behaves  on  problems 
with  Lischitz  functions.  Another  ingredient  needed  for  optimality  conditions  is  the  contingent 
cone,  which  generalizes  the  notion  of  tangent  cone  to  more  general  constraints.  The  following 
material  is  adapted  from  [18,  24]. 

DEFINITION  5.1.  Let  S  C  R"  be  nonempty.  The  cone  generated  by  S  is 
(5.1)  cone(S)  =  {  Xs  \  X  >  0  and  s  g  S  }  . 

A  tangent  vector  to  S  at  x  in  the  closure  of  S  is  v  g  R"  such  that  there  exists  a  sequence 
{jA-}  of  elements  of  S  that  converges  to  x  and  a  sequence  of  positive  real  numbers  {At} 
for  which  v  =  limtAt(yt  —  x).  The  set  T(S,x)  of  all  tangent  vectors  to  S  at  x  is  called  the 
contingent  cone  (or  sequential  Bouligand  tangent  cone )  to  S  at  x.  The  polar  cone  of  a  cone 
K  c  R”  is  K°  =  {x  g  R"  :  xTv  <  0  ,Vv  g  K}. 

For  X,  the  contingent  cone  is  the  same  as  the  tangent  cone.  The  normal  cone,  used  to 
define  KKT  points,  is  less  useful  here  than  the  polar  cone  since  the  normal  cone  in  our  context 
may  have  little  to  do  with  optimality  given  its  usual  definition  as  the  convex  conic  hull  of  the 
gradients  of  the  constraints.  The  polar  cone  of  the  contingent  cone  is  more  useful  in  this 
context. 

Optimality  conditions  for  a  differentiable  function  can  be  stated  in  terms  of  the  cone 
generated  by  the  convex  hull  of  a  set  S,  i.e.,  the  set  of  nonnegative  linear  combinations  of 
elements  of  S.  We  will  use  the  standard  notation  co(S)  for  the  convex  hull  of  .S',  but  rather 
than  use  the  induced  but  somewhat  unwieldy  notation  cone(co(S)),  we  will  use  the  notation 
cc(S)  for  the  convex  conic  hull  of  S.  Thus,  for  example,  to  say  that  a  set  S  is  a  positive 
spanning  set  is  to  say  that  cc(S)  =  R”. 

DEFINITION  5.2.  [8]  Let  g  :  R"  — >  R  be  Lipschitz  near  x  g  R".  Clarke’s  generalized 
derivative  at  x  in  the  direction  v  g  R"  is 

g  (x;  v)  :=  hmsup - . 

y— >x,  t\ o  f 

The  generalized  gradient  of  g  at  x  is  the  set 

dg(x)  :=  {s  g  R"  :  g°(x;v)  >  vTs  for  all  vgR"}. 

The  generalized  derivative  may  be  obtained  from  the  generalized  gradient  as  follows  :  g°  (x;  v)  = 
max{i'rs  :  s  g  3g(x)}. 

The  following  alternate  definition  of  directional  derivative  will  be  useful. 

Lemma  5.3.  Let  g  :  R"  — >  R  be  Lipschitz  near  x  g  R".  Then, 

0(-  \  g(y+tw)-g(y) 

g  (x;v)  =  limsup  - . 

y^x,  w — ►v,  fj,0  ^ 

Proof.  Let  L  be  a  Lipschitz  constant  for  g  near  x.  Then 

limsup  =  limsup  +  g(y+>W)-g(y+,v) 

y— >x,  w — >v,  f j.0  y^-x,  w — >v,  fj.0 

<  limsup  Slz+MrlM  +L||w-y||  =g°(x;v)  . 

LK,  W — >V,  f  j.0 
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On  the  other  hand,  setting  w  =  v  gives  a  lower  bound  on  the  limit  supremum. 

limsup  s(y+tw)-g(y)  >  iimSllplM  =  g°(x-,v)  . 
y— >x,  w— »v,  t[0  7— >x,  rjO 


In  order  to  show  that  the  Clarke  generalized  directional  derivative  is  nonnegative  at  a 
point  Jc  £  R"  in  the  direction  v  £  R”,  it  suffices  to  generate  three  subsequences:  {ja-}  converg¬ 
ing  to  x,  {wk}  converging  to  v,  and  {4}  converging  to  zero  from  above  in  such  a  way  that 
g(}’k)  <  g(yk  +  tkWk )  for  infinitely  many  k’s. 

5.1.  Clarke’s  derivatives  at  limit  points.  In  this  subsection,  we  develop  some  results 
about  the  directions  in  which  the  Clarke  derivatives  indicate  optimality.  To  save  space,  we 
prove  our  preliminary  results  for  a  function  g,  which  can  be  either  h  or  /.  Also,  for  any 
x  £  R",  we  define  rg(x)  to  be  the  closure  of  {x  £  R"  :  g(x  )  >£(■*)}• 

PROPOSITION  5.4.  Let  S  C  R"  be  nonempty,  and g  be  defined  on  an  open  superset  of  S, 
and  let  g  be  Lipschitz  near  x  £  S.  Necessary  conditions  for  x  to  be  a  local  minimizer  of  g  on 
S  are: 

•  g°(x;v)  >  0  for  every  v  £  T(S,x); 

•  If  g  has  a  Frechet  derivative  Vg(x)  at  x,  then  Vg(x)rv  >  0  for  every  v  £  co(T (S,x)), 
and  so  —  Vg(x)  £  co(T(S,x))°.  Thus,  ifT{S,x)  contains  a  positive  spanning  set, 
then  co(T(S,x))°  =  {0}  andVg(x)  =0. 

Proof  Let  S,g  and  jc  be  as  in  the  statement,  and  let  v  be  in  T(S\x).  Then,  there  exists 
a  sequence  {.4}  of  elements  of  S  converging  to  the  local  minimizer  Jc  of  g  on  S,  and  some 
positive  sequence  {Aa}  such  that  v  =  limA-AA^A  —  x).  If  v  =  0,  the  result  is  trivial.  If  v  7^  0, 
then  limA  ^  =  0. 

Now,  take  34  =  x,  wa  =  fk(xk  ~ x),  and  4  =  we  see  that 

g°(x;v )  >  lim  sup  — — —  =  Umsup  Aa^(xa) -g(x)]  . 

k  4  A 

But,  since  xa  £  S,  {xa}  converges  to  x,  and  x  is  a  local  minimizer  of  g  on  5,  we  have  that  for 
sufficiently  large  k,  Aa[#(xa)  —  g(x)]  is  nonnegative  and  the  first  result  follows. 

Now  assume  that  Vg(x)  is  the  Frechet  derivative  at  x.  Then  by  Theorem  4.14  of  [18], 
Vg(x)rv'  >  0  for  every  v'  £  T(S,x).  Let  v  £  co(T(S,x)).  Then,  there  is  a  nonnegative  co¬ 
efficient  vector  a  such  that  v  =  o.,s,  for  some  ,v,  £  T(S,x).  The  second  result  follows 

from  the  linearity  of  the  inner  product  and  the  definition  of  polar  cone.  If  T(S,x)  contains 
a  positive  spanning  set,  then  co(T(S,x ))  =  R”,  and  therefore,  for  every  v  7^  0,  we  have  that 
v,  — v  £  T(S,x),  and  so  Vg(x)rv  >  0  and  Vg(x)r(— v)  >  0,  which  completes  the  proof.  ■ 

The  approach  we  now  give  for  generating  directions  in  which  Clarke  derivatives  are 
nonnegative  generalizes  the  one  presented  in  [2],  Indeed,  the  following  result  will  be  useful  in 
enlarging  the  set  of  directions,  and  in  addition,  it  relates  the  generalized  directional  derivative 
to  the  class  of  iterative  methods  that  require  decrease  in  some  merit  function  at  each  iteration. 
We  prove  this  more  general  result  first. 

LEMMA  5.5.  Let  g  be  Lipschitz  near  the  limit  x  of  a  sequence  {yk}  .for  which  the  corre¬ 
sponding  values  g(yk)  are  monotone  nonincreasing,  and  for  which  y^  7^  xfor  all  k.  Ifv  is  any 
limit  point  of  the  sequence  |  ^  f  then  v€  ^(^(x),^)  and  g°{x\v)  >0. 

Proof  Let  {34}  and  x  be  as  in  the  above  statement.  There  is  at  least  one  limit  point  v  of 
{  ll’v^-qi  }  since  ^e  unit  ball  is  compact.  Setting  A-a  =  ||^1_f||  in  Definition  5. 1  yields  trivially 
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that  v  £  r(rg(x),x).  Moreover,  Lemma  5.3  implies 


g°(x;v)  =  limsup 

LX,  W— >V,  fjO 


g(y  +  fw)-g(y) 


>  lim  sup 


^(x+ll^-xll^l)  -g(x) 

II  yk-J 


lim  sup 

k—>  oo 


g(yk)  -g(x) 
\\yk-x\\ 


>o . 


5.2.  Choice  of  the  constraint  violation  norm.  The  constraint  violation  function  I12  (x)  = 
1 1 C (x-)  + 1 1  o  will  give  our  best  results  since  it  is  continuously  differentiable  whenever  C  is  (see 
Dennis,  El  Alem  and  Williamson  [11]  for  a  compact  formulation  of  V/Z2).  The  constraint 
violation  function  h\ (x)  =  ||C(x)+||  1  is  another  common  choice,  at  least  for  SQP.  Thus,  the 
question  arises  as  to  the  differentiability  of  h\.  The  answer,  which  implies  that  it  is  rarely 
strictly  differentiable  at  x,  is  given  by  the  following  result.  Recall  that  a  function  g  is  said  to 
be  strictly  differentiable  [19,  8]  at  x  if  lim-y-^jo  8<'y+tv^gf'y'1  =  Vg(x)rv  for  all  v  £  R”,  and  g 
is  said  to  be  regular  [8]  atx  if  for  all  v  €  R",  the  one-sided  directional  derivative  g'(x,  v)  in  the 
direction  v  exists  and  coincides  with  g°(x;v).  In  Section  7.2  we  will  see  an  example  showing 
the  cost  of  this  lack  of  smoothness. 

PROPOSITION  5.6.  IfC  is  regular  at  every  X,  then  so  is  h\.  Let  I(x)  =  {/ :  c,(x)  >  0}  and 
A(x)  =  {  i :  c,  (x)  =  0}  be  the  inactive  and  active  sets  at  x.  Then  the  generalized  gradients  are 
related  by 


3/zi(x)  =  Yj  dc,-(x)  +  j  Y  Y&  :  e  I0’  l]>Ct  €  dci(x),i  £  A(x)|. 

iG/(jc)  ieA(x) 

The  generalized  directional  derivatives  ofh\  and  C  in  a  direction  v  at  x  are  related  by 

h°i(x-,v)  =  Y  c°(x’v)+  E  (c°(x;v))+. 

i£l(x)  ieA(x) 

Thus,  ifC  is  strictly  differentiable  at  x,  then 

h°i(x;v)  =  Y  Vc,(x)rv+  Y  (vqWTv)+. 

i-l(x)  iCA(x) 

Proof.  The  proof  follows  from  various  results  in  [8],  and  some  simple  observations. 
Clarke’s  Propositions  2.3. 12  and  2.3.6  guarantee  that  c,(x)+  thus  h\  (x)  are  regular  at  x  when¬ 
ever  c;(x)  is. 

The  third  corollary  to  Proposition  2.3.3  implies  that  3/zi(x)  =  c)c, (x)  + ,  where  this 
means  all  possible  sums  of  an  element  from  each  3c,(x)+.  Propositions  2.3.12  implies  that 

idcfx)  if  Cj(x )  >  0, 

co{3c,(x),30(x)}  =  {y :  Yi  €  [0, 1  ],C<  £  dcfx)}  if  c,(x)  =  0, 

30 (x)  =  {0}  if  c,(x)  <  0. 

The  generalized  directional  derivative  in  any  direction  v  can  be  written  as:  If  (x;  v)  = 

£,(c,-(x;  v)+)°.  If  c,(x)  >  0  then  (c,(x;v)+)°  =  maxjv7^  :  ^  e  0c,(x)}  =  c°(x;v).  If  c,(x)  <  0 
then  (cKx;  v)+)°  =  max{vrC  :  C  €  50(x)}  =  0.  And  if  cfx)  =  0,  then 


A  PATTERN  SEARCH  FILTER  ALGORITHM 


June  12,  2003 


15 


(c,(x;v)+)°  =  max{vrr|  :  T|  G  8c,-(x)+} 

=  max{vrr| :  r\  G  {y& :  y,  G  [0, 1  ],£,■  G  3c,(x)}} 

f  0  if  max{vr£,-  G  3c,(x)}  <  0 

[  max{vr£;  G  3c,(x)}  otherwise 

=  (max{vr^  G  3c,-(x)})+.  =  (c?(x;v))+. 

The  last  part  of  the  result  follows  by  definition  of  strict  differentiability.  ■ 

Note  that  the  above  result  could  be  slightly  rewritten  by  using  one-sided  directional 
derivatives  instead  of  generalized  directional  derivatives.  Indeed,  if  C  is  regular  at  x,  then 
c°i  (x;  v)  coincides  with  the  one-sided  directional  derivative  c'  (x;  v),  and  h\  (x;  v)  coincides  with 
h[(x;v). 

5.3.  Refining  sequences.  The  purpose  of  the  following  definition  is  to  identify  a  limit  of 
trial  points  and  as  many  directions  as  possible  for  which  Clarke’s  derivatives  are  nonnegative 
at  that  limit.  We  will  make  the  convention,  which  is  implied  by  the  algorithm,  that  for  pk 
to  be  an  active  poll  center  implies  that  polling  around  pk  was  at  least  initiated  at  iteration 
k,  although  function  values  at  all  the  corresponding  poll  set  may  not  have  been  computed 
because  polling  is  allowed  to  stop  if  some  poll  step  yields  an  improved  mesh  point. 

DEFINITION  5.7.  A  convergent  subsequence  of  active  poll  centers  {pk}keK  (for  some 
subset  of  indices  K)  is  said  to  be  a  refining  subsequence  if  limkeK  Aa  =  0.  The  set  of  refining 
directions  for  g  associated  with  a  refining  subsequence  {pk}keK  Is 

Rg(K)  =  {v  G  R":  v  =  C  -  0  and  -  °°  <  g(pk  +  AkQ  <  g(pk  + AkQ  <  °° 

and pk  +  AkC11pk  +  Af%  G  Vkfor  infinitely  many  k  G  A"}, 

where  Vk  C  Pk  are  the  members  of  the  poll  set  visited  by  GPS.  The  set  of  limit  directions  for  g 
associated  with  the  limit  x  of  a  refining  subsequence  {pk}keK  is 

Lg(K)  =  {v  G  R":  3{yk}keK  C  Vk  \  {x~}  such  that  limy*  =  x,  and  g(yk)  >  g(yp) 

k£K 

Vk'  >  k  G  K,  and  v  is  a  limit  point  of  <  - -  >  }. 

)keK 

Figure  5.1  illustrates  an  example  of  refining  directions  for  a  refining  subsequence  pkj: 
The  subsequence  {pkjkeK  converges  to  x  and  has  six  associated  directions,  represented  by 
vectors,  and  four  limit  directions.represented  by  dotted  lines.  Note  that  in  the  above  definition, 
if  ^  =  0  then  the  refining  direction  v  =  C-fi  belongs  to  Dk.  Thus,  all  the  directions  in  infinitely 
many  Dk  where  polling  was  unsuccessful  in  finding  a  better  point  are  refining  directions. 
Also,  if  the  function  is  constant  in  the  refining  direction  v  G  R,AK),  then  — v  also  will  be  a 
refining  direction. 

We  now  show  the  existence  of  refining  subsequences  and  directions,  but  because  the 
limit  directions  depend  on  whether  g  is  /  or  h,  we  postpone  their  existence  results  for  the  next 
section. 

Lemma  5.8.  There  exists  at  least  one  refining  subsequence  composed  of  mesh  isolated 
filter  points.  Let  {pk}keK  be  a  refining  subsequence.  If  there  exists  some  tk  G  Vk  with  —  °°  < 
g(tk)  <  °°for  infinitely  many  k  in  K,  then  the  set  of  refining  directions  Rg(K)  is  nonempty. 

Proof.  Lemma  4.3  guarantees  that  there  exists  a  subsequence  of  iterations  whose  mesh 
size  parameter  goes  to  zero.  The  mesh  size  parameter  Ak  only  decreases  when  the  POLL  step 
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Fig.  5.1.  An  example  of  refining  and  limit  directions.  The  six  refining  directions  ( represented  by  vectors )  use 
the  fact  that  g(p/i.+A^.di)  >  gipc  3-  d2)  >  g(pk  ■  +  A^.d3)  >  gipc )-  The  four  limit  directions  are  represented  by 

dotted  lines. 


shows  that  all  trial  points  in  Pk  are  filtered.  Moreover,  the  assumption  that  all  trial  points 
(thus  all  active  poll  centers)  are  in  a  compact  set  implies  that  one  such  subsequence  has  a 
limit  point.  Thus,  there  exists  a  refining  subsequence  consisting  exclusively  of  mesh  isolated 
poll  centers. 

Let  pk  and  4  be  as  in  the  above  statement.  The  second  result  follows  from  the  fact  that 
the  directions  l>: belong  to  the  finite  set  Z),  and  therefore  there  is  a  direction  d  £  D  used 
infinitely  often.  By  definition,  either  d  or  —d  (or  both)  belongs  to  Rg(K).  ■ 

We  now  show  that  the  generalized  directional  derivative  is  nonnegative  at  limit  points  of 
refining  subsequences  for  all  associated  refining  and  limit  directions  for  g. 

Theorem  5.9.  Let  g  be  Lipschitz  near  the  limit  point  x  of  a  refining  subsequence 
{pk}keK-  Then  g  satisfies  optimality  conditions  at  x  on  cone{Rg{K )  U Lg(K))  in  the  sense 
that  ifv  £  cone(Rg(K)  U Lg(K))  then  g°(x;v)  >  0.  Moreover,  Lg(K )  C  T(Tg(x),x). 

Proof  Let  g  be  Lipschitz  near  the  limit  point  x  of  a  refining  subsequence  { Pk}keK ■  If 
v  =  C  —  4  7^  0  belongs  to  Rg(K)  for  some  £  R",  then  by  definition  of  the  generalized 
directional  derivative  we  have  that: 

*°(*v)  >  limsup8(('’‘  +  Aa  +  A“')~8(w  +  A‘g  >  0. 

keK 

The  same  result  on  cone(Rg(K)  U Lg[K))  follows  from  the  positive  homogeneity  of  the  Clarke 
generalized  directional  derivative.  If  v  belongs  to  Lg(K)  for  some  subsequence  {)'/.  }keK  C 
Vk  \  {%}  converging  to  x,  then  Lemma  5.5  completes  the  proof.  ■ 

The  previous  theorem  implies  that  one  of  the  advantages  of  using  a  large  number  of 
positive  spanning  directions  in  the  algorithm  is  that  the  set  of  directions  for  which  Clarke’s 
generalized  derivatives  are  shown  to  be  nonnegative  will  be  larger. 

The  following  corollary  strengthens  Theorem  5.9  when  g  is  strictly  differentiable  at  the 
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limit  point  x  Assuming  that  g  is  strictly  differentiable  at  x  as  defined  in  Section  5.2  is  equiv¬ 
alent  in  finite  dimensions  to  assuming  that  g  is  Lipschitz  near  x,  Frechet  differentiable  and 
regular  atx  [8]. 

COROLLARY  5.10.  If  g  is  strictly  differentiable  at  x,  then  Vg[x)Tv  >  0  for  every  v  £ 
cc(Rg(K)  U  Lg(K))  and  thus,  ifRg(K)ULg(K )  contains  a  positive  spanning  set,  then  Vg(x)  = 
0. 

Proof.  Assume  that  Vg(x)  is  the  Frechet  derivative  at  x.  Then  Theorem  5.9  ensures 
that  S7g(x)Tv  >  0  for  every  v  £  Rg{K )  U Lg(K).  Let  v;  £  cc(Rg(K)  ULg(K )).  Then,  there  is  a 
nonnegative  coefficient  vector  a  such  that  v;  =  Y.i aisi  for  some  ,v,  £  Rg{K )  U Lg(K).  The  first 
result  follows  from  the  linearity  of  the  inner  product. 

If  Rg{K)  U Lg(K)  contains  a  positive  spanning  set,  then  cc(Rg(K)  U Lg(K))  —  R",  and 
therefore,  for  every  v  0,  we  have  that  v,  — v  £  Rg{K)  U Lg(K),  and  so  Vg(x)rv  >  0  and 
Vg(x)r(— v)  >  0,  which  completes  the  proof.  ■ 

6.  Optimality  conditions  for  the  Filter-GPS  method.  We  will  continue  with  results 
that  only  consider  the  behavior  of  /;,  and  then  complete  our  results  by  analyzing  the  effect  of 
the  filter  on  the  objective  function  /. 

6.1.  Results  for  the  constraint  violation  function.  The  algorithm  definition  gives  pri¬ 
ority  to  feasibility,  as  expressed  by  the  constraint  violation  function,  over  minimizing  the 
objective  function.  A  consequence  of  this  is  that  the  optimality  conditions  guaranteed  by 
the  algorithm  are  stronger  for  h,  i.e.,  achieving  feasibility,  than  for  they  are  for  achieving 
constrained  optimality.  Indeed,  in  the  absence  of  the  assumption  of  linearly  independent  con¬ 
straint  gradients,  our  feasibility  results  are  what  one  would  prove  for  standard  SQP  methods 
-  that  we  obtain  a  stationary  point  of  the  norm  of  the  constraint  violations. 

A  first  obvious  comment  is  that  h(pk)  =  hx(pk)  for  every  poll  center  pk  since  trial  points 
violating  some  linear  constraints  are  rejected.  Therefore,  any  limit  of  poll  centers  belongs  to 
X.  So  the  analysis  can  be  done  in  terms  of  h  instead  of  hx-  Another  observation  is  that,  by 
definition,  if  h(x)  =  0  then  x  is  a  global  minimizer  for  h.  Furthermore,  any  limit  point  of  a 
sequence  of  feasible  points  would  be  feasible  if  h  were  lower  semicontinuous  there,  or  if  the 
feasible  region  is  closed.  However,  it  is  possible  for  a  sequence  of  least  infeasible  poll  centers 
to  converge  to  an  infeasible  point.  We  will  therefore  concentrate  on  limit  points  of  infeasible 
mesh  isolated  poll  centers. 

Before  presenting  results  that  assume  local  Lipschitz  continuity,  we  prove  the  following 
result,  which  shows  in  particular  that  if  any  limit  point  of  least  infeasible  poll  centers  is 
feasible  and  if  h  is  continuous  there,  then  all  limit  points  at  which  h  is  lower  semicontinuous 
are  also  feasible.  It  also  provides  a  way  to  identify  some  limit  directions  in  L/fK). 

THEOREM  6.1.  Let  {p[}keK  be  a  convergent  subsequence  of  least  infeasible  poll  cen¬ 
ters.  Then  lim/,  h{ pf)  exists,  and  if  h  is  lower  semicontinuous  at  any  limit  point  x  of  {p[}, 
then  lim  kh(pfy  >  h(x)  >  0.  Every  limit  point  of  least  infeasible  poll  centers  at  which  h  is 
continuous  has  the  same  constraint  violation  function  value.  Furthermore,  if  h  is  Lipschitz 

near  any  x,  then  h°(x;v)  >  0  for  any  limit  direction  v  of  <  ,,PJ  *  >.  In  addition,  each  limit 

l  \\Pk~xW  J 

direction  satisfies  v  £  T(Tifx),x). 

Proof.  The  sequence  {hip1,)}  is  convergent  because  it  is  a  nonincreasing  sequence  of 
positive  numbers.  Of  course,  for  any  subsequence  of  {/?[},  the  corresponding  h  values  have 
the  same  limit.  Thus,  if  h  is  lower  semicontinuous  at  x,  we  know  that  for  any  subsequence 
{pk}keK  of  the  iteration  sequence  that  converges  to  Jc, 

lim h(p{)  =  lim h(p{)  =  liminf h(p[)  >  h{x)  >  0  . 
k  keK  keK 
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If  h  is  continuous  at  some  limit  points  of  infeasible  poll  centers,  then  the  same  argument 
shows  that  all  such  limit  points  have  the  same  value  of  the  constraint  violation  function.  Thus, 
if  any  such  limit  point  is  feasible,  they  all  are  feasible. 

The  rest  of  the  proof  follows  by  noticing  that  v  £  Li,(K ),  and  by  applying  Theorem  5.9.  ■ 

The  next  result  guarantees  some  nonsmooth  first  order  optimality  conditions.  It  shows 
that  Clarke  derivatives  for  h  are  nonnegative  in  a  subset  of  refining  directions  whose  convex 
conic  hull  is  the  tangent  cone  to  X. 

PROPOSITION  6.2.  Let  x  be  the  limit  of  a  refining  subsequence  composed  of  mesh  iso¬ 
lated  filter  points  {pk}keK,  and  assume  that  the  rule  for  selecting  Dk  conforms  to  X  for  an 
e  >  0.  If  h  is  Lipschitz  near  x,  then  h°(x;v)  >  0  for  any  direction  v  in  a  set  of  directions 
D'  C  Ri,(K)  satisfying  cc(D')  =  T(X  ,x). 

Proof  Let  {pk}keK,  £  and  x  be  as  in  the  statement  of  the  result.  When  h  (x)  =  0,  h°  (x;  v)  > 
0  for  any  v  £  M",  so  assume  that  h(x)  >  0.  Since  the  rule  for  selecting  /)/  conforms  to  X  for 
an  e  >  0,  then  there  exists  a  subset  of  directions  D'  of  D  such  that  cc(D')  =  T(X,x),  and  for 
any  v  £  D'  and  sufficiently  large  k  £  K,  pk  +  Ak\’  £  PkUX  and  h(pk  +  A^v)  >  0.  However, 
since  the  poll  centers  are  mesh  isolated  filter  points,  it  follows  that  h(pk  +  A^v)  >  h(pk ),  and 
therefore  v  belongs  to  Ri,(K).  Theorem  5.9  completes  the  proof.  ■ 

A  consequence  of  this  result  is  that  if  h  is  strictly  differentiable  at  the  limit  point  of  a 
refining  subsequence  composed  of  mesh  isolated  filter  points,  then  standard  first  order  opti¬ 
mality  conditions  for  h  are  satisfied. 

COROLLARY  6.3.  Let  x  be  the  limit  of  a  refining  subsequence  composed  of  mesh  isolated 
filter  points  {pk}keK,  and  assume  that  the  rule  for  selecting  Dk  conforms  to  X  for  an  e  >  0. 
Ifh  is  strictly  differentiable  at  x,  then  V/z(x)rv  >  0  for  every  v  in  T(X,x). 

Proof  The  result  is  a  direct  consequence  of  Corollary  5.10  and  Proposition  6.2.  ■ 

6.2.  Results  for  the  objective  function.  We  have  shown  above  that  the  limit  point  for  a 
refining  subsequence  generated  by  the  algorithm  satisfies  local  optimality  conditions  for  the 
constraint  violation  function.  We  now  derive  some  results  for  the  objective  function.  The  first 
result  proposes  a  way  to  identify  some  limit  directions  in  LAK). 

PROPOSITION  6.4.  Let  {Pk}keK  be  a  convergent  subsequence  of  feasible  poll  centers. 
If  f  is  lower  semicontinuous  at  x,  then  lim kf(,Pk)  exists  and  is  greater  than  or  equal  to  /(x). 
The  set  of  such  limit  points  at  which  f  is  continuous  all  have  the  same  objective  function 

value.  Furthermore,  if  f  is  Lipschitz  near  any  x,  then  any  limit  direction  v  of  j  j-4  *  >  is 

1  \\Pk  -*11  J 

such  that  /°(x;v)  >  0  and  v  £  T(Ll,x)  D  7’(r y(x),x). 

Proof.  Let  {Pk}keK  and  x  be  as  in  the  above  statement.  The  subsequence  f(Pk)keK  *s 
monotone  nonincreasing  and  bounded  below  by  the  finite  value  /(x),  and  therefore  it  con¬ 
verges. 

Since  pf  £  LI  for  every  k  £  K,  it  follows  by  definition  of  the  contingent  cone  that 
t'  £  T(Ll,x).  The  rest  of  the  proof  follows  by  noticing  that  v  £  Lf(K),  and  by  applying 
Theorem  5.9.  ■ 

The  next  pair  of  results  guarantees  some  nonsmooth  first  order  optimality  conditions 
related  to  the  cone  tangent  to  X.  They  are  similar  to  the  first  order  optimality  results  for 
h.  Proposition  6.2  and  Corollary  6.3,  except  that  they  require  the  limit  point  to  be  strictly 
feasible  with  respect  to  LI.  Basically,  these  results  show  that  when  the  nonlinear  constraints 
are  not  binding,  the  use  of  the  filter  does  not  interfere  with  the  linearly  constrained  results. 

PROPOSITION  6.5.  Let  x  be  the  limit  of  a  refining  subsequence  composed  of  mesh  iso¬ 
lated  filter  points  {pk}keK,  and  assume  that  the  rule  for  selecting  Dk  conforms  to  X  for  an 
£  >  0.  If  f  is  Lipschitz  near  x,  and  if  x  is  strictly  feasible  with  respect  to  LI,  then  /°(x;  v)  >  0 
for  any  direction  v  in  a  set  of  directions  D'  C  R/fK)  satisfying  cc{D')  =  T  (X,x). 
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Proof.  Let  { pi_  }keK,£  and  x  be  as  in  the  statement  of  the  result.  Since  the  rule  for 
selecting  I)/(  conforms  to  X  for  an  e  >  0,  then  there  exists  a  subset  of  directions  D'  of  D  such 
that  cc(D')  =T(X,x),  and  for  any  v  £  O'  and  sufficiently  large  k  £  K,  p k  +  A^v  £  Pk  U X  U  LI. 
However,  since  the  poll  centers  are  mesh  isolated  filter  points,  it  follows  that  f(pk  +  A^v)  > 
f(pk ),  and  therefore  v  belongs  to  R f(K).  Theorem  5.9  completes  the  proof.  ■ 

The  following  corollary  to  this  result  shows  standard  first  order  optimality  conditions  for 
/  on  X  under  the  additional  assumption  of  strict  differentiability. 

COROLLARY  6.6.  Let  x  be  the  limit  of  a  refining  subsequence  composed  of  mesh  isolated 
filter  points  {pk}keK,  and  assume  that  the  rule  for  selecting  Dk  conforms  to  X  for  an  £  >  0. 
If  f  has  a  strict  derivative  V/(x)  at  x,  and  if  x  is  strictly  feasible  with  respect  to  LI,  then 
V f(x)Tv  >  0  for  every  v  in  T ( X,x ). 

Proof  The  result  is  a  direct  consequence  of  Corollary  5.10  and  Proposition  6.5.  ■ 

Note  that  since  it  is  assumed  in  Corollary  6.6  that  x  is  feasible  with  respect  to  LI,  and 
since  the  algorithm  reduces  to  the  one  in  [22,  2]  in  the  absence  of  general  constraints,  the 
proof  of  the  corollary  also  follows  from  a  result  in  [2], 

Our  next  result  does  not  assume  strict  feasibility  of  the  limit  point.  It  is  a  corollary 
of  Corollary  5.10.  It  gives  conditions  for  the  limit  point  of  a  refining  sequence  to  satisfy 
optimality  conditions  on  problem  (1.1).  It  is  that  the  convex  conic  hull  of  the  union  of  the 
refining  and  the  limit  directions  contains  the  contingent  cone  for  the  feasible  region  at  x.  An 
interesting  aside  is  that  this  condition  can  be  met  without  any  feasible  descent  directions  in 
any  poll  set.  In  the  simple  linearly  constrained  case,  this  is  implied  by  ensuring  that  the 
polling  directions  conform  to  the  boundary  of  X,  but  here,  the  corresponding  assumption  that 
the  polling  directions  for  a  refining  sequence  generate  the  contingent  cone  for  the  feasible 
region  at  x  is  not  as  constructive.  This  result  is  illustrated  on  the  three  examples  of  Section  7. 

PROPOSITION  6.7.  Let  x  be  the  limit  point  of  a  refining  subsequence  {pk}keK,  If  f 
has  a  strict  derivative  V/(x)  at  x,  then  — V/(x)  belongs  to  the  polar  C°f  of  Cf  =  cc(Rf(K)  U 
Lf{K)),  and  so  x  satisfies  the  optimality  conditions  of  Proposition  5. 10  for  f  on  Cf.  Moreover, 
if  T(Ll  nX,f)  C  Cf,  then  x  satisfies  the  optimality  conditions  of  Corollary  5.10  for  f  on 
Problem  (1.1). 

Proof.  Let  x,  f  and  Cf  be  as  in  the  above  statement.  Corollary  5.10  guarantees  that 
V/(x)rv  >  0  for  any  vector  v  £  Cf.  The  results  follows  from  the  definition  of  polarity: 
In  general  —V f(x)  £  Cf  =  {u  £  M"  :  uTv  <  0  Vv  £  C/}.  If  Cf  L  T(Li  C\X,x),  then  Cf  C 
r°(f2nA_,x),  and  the  proof  is  complete.  ■ 

Remark:  Notice  that  under  the  assumption  that  the  contingent  cone  generators  of  the 
nonlinear  constraints  binding  at  x  belong  to  the  set  of  refining  or  limit  directions  (as  will 
be  the  case  for  linear  constraints  and  conforming  directions,  see  Definition  4.1),  then  the 
preceding  result  reduces  to  the  corresponding  result  from  [2,  22],  This  is  because  in  that  case 
the  contingent  cone  is  the  tangent  cone,  and  the  polar  of  the  contingent  cone  is  the  normal 
cone  so  x  is  a  KKT  point. 

By  using  a  filter  based  step  acceptance  criterion,  we  have  overcome  a  difficulty  in  ap¬ 
plying  pattern  search  algorithms  to  constrained  optimization.  Specifically,  that  the  objective 
function  descent  directions  in  the  positive  spanning  set  D  may  be  infeasible.  Lewis  and  Tor- 
czon  [20]  give  an  example  where  a  nonfilter  version  of  the  pattern  search  algorithm  stalls 
( i.e .,  all  subsequent  iterates  are  the  same  mesh  isolated  filter  point),  at  a  point  containing  a 
strictly  feasible  descent  direction.  The  following  result  shows  that,  under  assumptions  on 
the  smoothness  of  the  functions  but  regardless  of  the  choice  of  positive  spanning  set,  our  al¬ 
gorithm  will  eventually  find  an  unfiltered  mesh  point,  except  when  Vf(pk)  =  0.  This  is  an 
essential  ingredient  of  any  method  with  ambitions  to  find  more  than  a  single  local  constrained 
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minimizer. 

PROPOSITION  6.8.  If  h  and  f  are  both  strictly  differentiable  at  the  poll  center  pk,  and 
ifVf(pk)  0,  then  there  cannot  be  infinitely  many  consecutive  iterations  where  pk  is  a  mesh 
isolated  filter  point. 

Proof  Let  h  and  /  be  strictly  differentiable  at  pk  where  Vf(pk)  'f  0.  Assume  that 
there  are  infinitely  many  consecutive  iterations  where  pk  is  a  mesh  isolated  filter  point.  Let 
d  be  a  direction  used  infinitely  often  in  the  (constant)  subsequence  of  poll  centers  such  that 
Vf(Pk)Td  <  0. 

Since  the  function  h  is  strictly  differentiable  at  pt  .  there  exists  an  8  >  0  such  that  one  of 
the  two  conditions  is  satisfied:  either  hx(pk  +  Ad)  <hx(pk)  <  hmax  or  hx(pk+Ad)  >hx(pk ) 
for  all  0  <  A  <  8. 

If  the  first  condition  is  satisfied,  then  for  A/;  <  e  the  POLL  step  will  find  an  unfiltered 
mesh  point.  This  is  a  contradiction.  If  the  second  condition  is  satisfied,  then  let  h  be  the 
smallest  value  of  { hx(x )  :  hx(x)  >  hx(pk),x  G  fik}  U  {hmax},  and  let  /  be  the  corresponding 
objective  function  value,  i. e. ,  either  /  =  f(x)  for  some  vector  x  €  fik  that  satisfies  hx(x)  =  h, 
or  /  =  —  °°  in  the  case  that  h  =  hmax.  It  follows  that  h  >  hx(pk)  and  /  <  f(pk )■  Therefore, 
whenever  Ak  <  8  is  small  enough,  the  following  inequalities  hold:  hx(pk)  <  hx(pk  +  Akd)  <h 
and  /  <  f(pk  +  Akd)  <  f(pk),  thus  the  trial  mesh  point  is  unfiltered.  This  is  a  contradiction. 


7.  Illustration  of  our  results.  We  now  illustrate  the  behavior  of  our  algorithm  on  three 
test  examples  and  on  a  real  engineering  problem.  The  first  test  example  is  due  to  Lewis  and 
Torczon  [20].  Unlike  the  barrier  approach  in  [20],  the  filter  approach  can  converge  even  with 
a  badly  chosen  positive  spanning  set. 

The  second  example  justifies  our  choice  of  the  squared  I2  norm  over  the  t\  norm  in 
the  definition  of  the  constraint  violation  function.  The  non-smoothness  of  the  latter  may  not 
provide  descent  on  h\  in  some  of  the  poll  directions  for  which  hi  does  descend.  The  example 
shows  that  since  h  \  does  not  allow  movement,  using  it  can  result  in  stalling  at  an  infeasible 
point. 

The  third  example  shows  the  limitations  of  our  results;  there  is  more  left  to  do.  This 
example  uses  the  algorithm’s  flexibility  as  a  loophole  to  avoid  a  desirable  outcome.  Even 
with  the  squared  £ 2  norm,  it  is  still  possible  to  choose  the  positive  spanning  sets,  and  to  be 
unlucky,  in  a  way  that  there  is  a  polling  direction  which  is  a  feasible  descent  direction  for  the 
objective  function  /  from  the  limit  point  x.  This  does  not  contradict  our  results,  but  it  does 
show  their  limitations  without  a  suitable  SEARCH  scheme. 

The  last  example  is  a  wing  planform  design  problem  from  Boeing  for  a  different  airplane 
than  the  two  airplanes  used  to  generate  the  results  reported  in  [3], 

7.1.  Example  of  Lewis  and  Torczon.  Consider  the  linear  program  [20] 

min  —a  —  2b 

x—{a,b)T 

s.t.  0  <  a  <  1 
b<  0. 

The  optimal  solution  is  x  —  ( 1 .  0jy.  Let  us  apply  our  algorithm  with  initial  point  xo  =  (0,0)r 
and  initial  mesh  size  parameter  Ao  =  1  and  with  a  single  positive  spanning  matrix  Z)/.  =  D 
constructed  with  the  four  directions  ±(1, 1  )r  and  ±(1,  —  l)r.  We  will  not  use  any  SEARCH 
step  for  this  example.  It  is  pointed  out  in  [20]  that  all  iterations  of  a  “barrier”  pattern  search 
algorithm  that  assigns  an  objective  function  value  of  4- 00  to  infeasible  points  but  does  not  take 
into  consideration  the  geometry  of  the  feasible  region,  remain  at  the  origin  since  the  polling 
directions  that  yield  decrease  in  the  objective  function  are  infeasible. 
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Suppose  that  the  constraints  are  given  as  black  boxes,  and  that  the  algorithm  is  not  aware 
that  they  are  linear.  Therefore  X  =  R2  and  for  any  x  £  R2  the  value  C(x),  and  thus  h(x)  = 
hx(x )  can  be  computed. 

One  might  consider  using  the  unconstrained  GPS  on  an  l\  exact  penalty  function  for  this 
problem.  It  turns  out  that  for  any  penalty  constant  greater  than  or  equal  to  3,  the  algorithm 
with  the  same  starting  data  never  moves  from  the  origin.  The  penalty  constant  must  be  greater 
than  2  for  the  problem  to  have  the  same  solution,  so  the  penalty  function  approach  is  not  useful 
here. 

Our  filter  algorithm,  using  the  above  mentioned  spanning  set  converges  to  the  optimal 
solution.  Mesh  directions  that  conform  to  the  boundary  of  the  feasible  region  cannot  be 
identified.  Figure  7. 1  displays  the  first  few  iterations.  The  shaded  area  is  the  feasible  region. 
The  poll  centers  are  underlined,  and  the  functions  values  are  displayed  between  brackets: 
[h(x),f(x)\.  The  points  in  the  poll  set  are  joined  to  the  poll  center  by  dotted  lines. 


(a)  (b) 

Legend:  [h(x),f(x)\ 


Fig.  7.1.  First  iterations  on  example  from  Lewis  and  Torczon. 


a 


Figure  7.1(a)  illustrates  the  iterations  for  which  =  1 .  Starting  at  xq  =  po=  (0, 0) T  the 
algorithm  evaluates  both  functions  at  ±(1,  l)r  and  ±(1,  —  l)r.  Only  the  trial  point  (1,-1  )T 
is  feasible;  it  is  however  dominated  by  xo-  The  point  (1.1)7  dominates  the  two  other  trial 
points  and  is  unfiltered. 

Let  x\  =  p\  = (1, l)r.  The  functions  are  evaluated  at  the  four  points  around  p  \  and  two 
unfiltered  points  are  found:  X2  =  (0, 2)7  and  (2. 2) 7 .  Even  if  an  unfiltered  mesh  point  was 
found,  the  poll  center  p2  remains  at  p\ .  Polling  around  pi  yields  filtered  points,  thus  pi  is 
a  mesh  isolated  filter  point.  Figure  7.2  displays  the  filters  corresponding  to  the  iterates  in 
Figure  7.1.  Figure  7.1(b)  starts  at  iteration  3  with  pi  =  (1,  l)r  and  A3  =  j.  Two  consecutive 
iterations  where  an  unfiltered  mesh  point  is  found  lead  to  P4  =  (|,  J, ) 7  then  p$  =  ( 1 , 0) 7  , 
which  is  the  optimal  solution. 

However,  since  the  gradient  is  nonzero  at  this  point.  Proposition  6.8  ensures  that  polling 
around  this  point  will  eventually  produce  an  unfiltered  mesh  point.  Indeed,  as  shown  in 
Figure  7.1(c),  iteration  5  produces  a  mesh  isolated  filter  point,  but  iteration  6  generates  an 
unfiltered  mesh  point,  which  is  a  new  infeasible  incumbent  with  minimal  constraint  violation. 
For  this  example,  the  limit  point  is  feasible,  and  so  it  is  a  global  optimizer  for  the  constraint 
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violation  function.  The  set  of  refining  and  limit  directions  for  /  are 


Rf(K) 

Lf(K) 


-2 

0 


1 

T2 


s/2 


1 

-1 


The  polar  of  the  cone  spanned  by  the  refining  and  limit  directions  of  /  is  spanned  by  {(1 ,  l)r, 
(0, 1  )7  },  and  indeed  contains  —  V/(1,0)  =  ( 1 , 2)T  (as  stated  by  Proposition  6.7).  Moreover, 
non-negative  combinations  of  the  last  two  directions  of  Rf(K)  span  the  contingent  cone  to 
the  constraints,  and  therefore  the  solution  satisfies  the  KKT  conditions. 

Remark:  Observe  that  the  choice  of  the  poll  centers  is  also  important  to  the  quality 
of  the  limit  points  the  algorithm  finds.  Indeed,  in  this  example,  if  one  were  to  always  take 
the  current  poll  center  to  be  the  best  feasible  incumbent,  then  the  refining  sequence  will  have 
every  term  equal  to  vo.  C/  is  again  spanned  by  the  same  set  of  directions  Rf(K). 

Of  course,  continuing  to  poll  around  an  unchanging  feasible  incumbent  is  a  bad  idea 
since  it  ignores  the  flexibility  of  the  filter  method  by  reducing  it  to  the  barrier  method.  A 
better  poll  center  selection  strategy  could  be  to  alternate  between  the  two  incumbents  every 
time  the  POLL  step  detects  a  mesh  isolated  filter  point.  Polling  around  the  infeasible  one  with 
minimal  constraint  violation  is  especially  interesting  when  f1  is  less  than  fF  since  it  might 
move  the  iterates  away  from  a  local  optimum,  or  toward  a  more  interesting  part  of  the  feasible 
region.  That  way,  there  will  be  infinitely  many  poll  steps  around  both  types  of  incumbents.  It 
is  also  worthwhile  to  change  the  positive  spanning  set  to  enrich  the  set  of  refining  directions 
for  both  h  and  /.  The  flexibility  of  our  theory  ensures  that  such  heuristics  can  be  part  of  a 
rigorously  convergent  algorithm. 

7.2.  Choice  of  the  constraint  violation  norm.  The  choice  of  the  norm  in  the  definition 
of  the  constraint  violation  function  h(x)  =  ||C(x)  +  |j  affects  the  convergence  behavior.  The 
example  presented  here  complements  the  theoretical  results  of  section  5.2.  We  prefer  the 
squared  (2  norm  over  the  £\  norm  since  it  is  differentiable  whenever  the  constraint  function  C 
is  (see  [11]  for  an  explicit  formulation  of  the  gradient).  This  means  that  if  there  is  a  descent 
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direction,  then  a  positive  spanning  set  will  detect  it  with  the  £2  norm,  but  t\  might  miss  it  (see 
Corollary  6.3).  This  is  illustrated  in  the  following  simple  linear  program 

min  b 

x=(a.b)T 

s.t.  —  b  <3a<b 
b>  1 

with  t\  constraint  violation  function  hi(a,b)  =  max(3a  —  b,0)  +max(— 3a  —  b,0)  +max(l  — 
b,0). 

Let  the  algorithm  start  at  the  infeasible  point  *0  =  Po  =  (0,0)r,  and  let  the  positive 
spanning  setbeD={(l,l)r,(l,  — l)r,(0,  — l)r}.  The  iterates  and  the  filter  are  depicted  in 
Figure  7.3. 


Legend:  [hi(x),f(x)] 


Fig.  7.3.  The  algorithm  stalls  with  the  t\  norm. 


Every  even  iteration  k  produces  an  unfiltered  mesh  point  that  does  not  improve  any  of 
the  incumbents:  The  trial  point  x^+x  =  Pk  +  A^(l,  l)r  is  unfiltered  by  fa.  Every  odd  iter¬ 
ation  confirms  that  the  poll  center  is  a  mesh  isolated  filter  point:  The  three  trial  points  are 
filtered  since  pk+i  =  Pk  =  (0. 0)T .  Therefore,  the  mesh  size  parameter  is  reduced  at  each  odd 
iteration. 

The  sequence  of  poll  centers  stalls  at  the  infeasible  point  x  =  (0, 0) T .  This  means  that  the 
non-differentiability  of  h\  hides  the  descent  directions  for  the  constraint  violation  function. 
It  also  means  that  this  is  another  example  where  the  unconstrained  GPS  algorithm  applied 
to  the  exact  penalty  function  fails  as  a  solution  approach.  However,  as  guaranteed  by 
Theorem  5.9,  h°  (x.  v)  is  nonnegative  for  the  positive  spanning  directions  in  D  as  well  as  for 
other  refining  and  limit  directions  for  h: 


The  sets  of  refining  and  limit  directions  for  /  are 


Rf(K) 

Lf(K) 
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The  polar  of  the  cone  spanned  by  the  refining  and  limit  directions  for  /  reduces  to  the  negative 
gradient  of /:  { (0,  —  1) r  } . 

Remark:  If  the  squared  to  norm  is  used  for  h  instead  of  i\,  then  the  poll  center  moves 
away  from  (0,0)  to  (Ak,Ak)T  as  soon  as  the  mesh  size  parameter  drops  below  |  since  0  < 
h( Ak,  Ak)  <  1  whenever  0  <  A/.  <  ?.  The  set  of  refining  and  limit  directions  for  /  are  the 
same  as  above  but  the  algorithm  converges  to  a  global  optimal  solution. 

7.3.  Illustration  of  the  limitation  of  the  results.  Consider  the  problem 

min  b 

x—(a,b)T 

s.t.  a(l  —  a)  —  b<0. 

The  algorithmic  strategies  described  below  are  such  that  the  algorithm  goes  through  infinitely 
many  consecutive  cycles  of  three  iterations,  and  the  sequence  of  iterates  converge  to  a  feasible 
limit  point  containing  a  feasible  descent  direction  used  infinitely  often  by  the  iterates.  The 
first  iteration  of  each  cycle  improves  the  feasible  incumbent,  the  second  one  improves  the 
least  infeasible  incumbent,  and  the  last  one  produces  a  mesh  isolated  filter  point.  We  admit 
that  the  flexibility  in  the  choice  of  polling  directions  is  exploited  to  lead  to  a  weak  result,  but 
our  point  is  that  it  can  happen. 

The  trial  points  generated  during  cycle  t  are  summarized  in  Table  7.1.  The  algorithm 
does  not  perform  any  SEARCH,  and  complete  polling  is  always  performed.  The  table  also 
displays  the  positive  spanning  directions  used  at  each  POLL  step.  The  initial  points  in  cycle 
1=  1  are  po  =  x!  =  ( \ ,  0) T  and  xF  =  (0, 1 ) T ,  and  the  initial  mesh  size  parameter  is  Aq  =  g . 


Table  7.1 

Description  of  the  three  iterates  of  cycle  t  with  initial  incumbents  xF 
function  values  [h(xF),f(xF )]  =  jo,  and  [h(x,),f(xJ)\  =  ,oj  ■ 


and 


Mesh 

Poll 

Poll 

Trial 

Size 

Center 

Dirs 

Points 

Comments 

(A*) 

(Pk) 

[hJ] 

(Dk) 

(°>  2<+?) 


1 

2?+2 


(2FO’0) 


(-2,1) 


2m-l  n 

4f+l 


(2,0) 


(f’°) 


(0,-1) 


YET 

,2f+‘  1  iM  ) 


0, 


2e+2 


2—1 
4'  1 


xh  is  improved 


filtered  by  poll  center 


3x2  —  1  -1 


4f  t':  )  2f+ 2 


unfiltered 


1 

ff+2 


2^)  2f+7 


(0,-2)  0, 


1  -1 

2^+2  )  fppl 


(1,-D 


1 

2^+2 


,° 


(-1,2) 


(  2^+-  ’  2*+2  ) 


lt+2-\ 


4^+2 


- 


unfiltered 


x1  is  improved 


filtered  by  poll  center 


2/+2  Q't2-()) 


(1,0) 


2f+t 5 


2<+2-l  A 

4^+2  5 u 


T-^+l  _ 

~4 i+r 


(-1,1) 


1  2e+2  , 


(-1,-1) 


I  0,  jjri 


0, 


2f+2 


already  in  filter 


already  in  filter 


1  -1 

2^+2  »  2^+2 


already  in  filter 


Figure  7.4  displays  the  first  cycle  (polling  around  the  poll  centers  po,Pi  and  po)  and 
the  corresponding  filter.  Cycle  1  terminates  with  a  mesh  isolated  filter  point,  and  cycle  2 
is  initiated  at  py  =  (g,0)r  with  A3  =  More  generally,  cycle  t  terminates  with  a  mesh 
isolated  filter  point.  The  mesh  size  parameter  is  divided  by  2,  and  cycle  t+l  starts. 
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All  trial  points  including  the  sequence  of  mesh  isolated  filtered  points,  i.  e. ,  the  iterates 
corresponding  to  the  third  step  of  the  cycles,  converge  to  the  feasible  point  x  =  (0.0) 7 .  There 
are  no  other  limit  points.  The  results  of  Section  6.1  concerning  the  constraint  violation  func¬ 
tion  are  clearly  satisfied  since  the  limit  point  is  at  the  global  minimum  of  h.  However,  there 
is  a  feasible  direction  from  the  limit  point  used  infinitely  often  by  the  subsequence,  which  is 
also  a  descent  direction  for  the  objective  function.  The  sets  of  refining  and  limit  directions 
for  /  are 


Rf(K) 

Lf(K) 


-1 

0 


1 

’7m 


-i 

3 


The  polar  of  the  cone  spanned  by  these  directions  is  the  cone  spanned  by  the  single  direction 
(0,  — l)r,  which  is  the  gradient  at  the  origin.  Thus,  Proposition  6.7  is  again  sharp.  The 
contingent  cone  at  the  origin  is  the  half  space  a  —  b<  0,  and  the  intersection  of  the  contingent 
cone  at  the  origin  with  C/,  the  cone  generated  by  convex  conic  hull  of  Rf(K)  U Lf(K),  is  the 
convex  conic  hull  of  (— 2,0)T  and  (1,  l)r. 

7.4.  Filter  results  on  a  Boeing  planform  design  application.  The  GPS  filter  algorithm 
has  been  applied  often  to  Boeing  wing  planform  design  problems.  The  wing  planform  is  the 
two-dimensional,  downward,  vertical  projection  of  the  wing.  The  design  variables  are  the 
line  segment  end  point  for  the  wing  leading  edges,  trailing  edges,  and  spars.  Also  there  are 
variables  related  to  wing  thickness  and  aerodynamic  loading  [3],  A  typical  design  problem 
is  to  minimize  direct  operating  cost  subject  to  several  constraints.  The  constraints  include 
required  range,  maximum  approach  velocity,  maximum  required  runway  length,  and  several 
others.  The  analysis  code  is  a  sophisticated  combination  of  preliminary  design  tools  from 
many  disciplines.  The  disciplines  include  structures,  aerodynamics  weights,  costing,  and 
configuration  management. 

This  problem  has  17  variables,  13  nonlinear  constraints,  and  no  linear  constraints.  The 
best  point  in  the  initial  surrogate  (a  kriging  model  that  interpolates  data  from  200  points 
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obtained  from  an  orthogonal-array-based  Latin  hypercube)  is  the  least  infeasible  point,  which 
has  a  constraint  violation  of  0.426  and  an  objective  of  9.845. 

Filter  after  1 5  evaluations 

9.85 
9.8 
-  9.75 
9.7 
9.65 
9.6 

0  5  10  15  20  25  0  0.2  0.4  0.6 

h  h 

Filter  after  50  evaluations  Zoom  of  filter  on  left 


Zoom  of  filter  on  left 


x 


Filter  after  1 1 7  evaluations 


Fig.  7.5.  Filter  progression  on  a  Boeing  planform  design  application 


Figure  7.5  illustrates  the  progression  of  the  filter  for  this  application.  In  all  plots,  the 
symbol  x  represents  the  initial  point,  except  in  the  bottom  right  plot  due  to  the  scale  change. 
The  top  two  plots  correspond  to  the  first  15  function  evaluations,  the  middle  ones  the  first  50 
and  the  bottom  ones  are  the  whole  filter  after  completing  the  1 17  evaluations  after  the  initial 
200.  The  initial  point  gets  filtered  at  the  3rd  function  evaluation.  The  first  feasible  point  is 
found  at  the  58th  evaluation.  The  best  feasible  point  is  denoted  on  the  two  bottom  plots  by  a 
star  at  (0,9.75)r. 

The  bottom  left  plot  contains  several  trial  points  with  an  objective  function  value  near 
9.6.  This  suggests  that  the  SEARCH  strategy  tried,  but  was  unsuccessful  in  finding  a  feasible 
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design  with  such  a  low  /  value. 

7.5.  Discussion.  Though  the  algorithm  behaved  very  well  on  the  industrial  design  prob¬ 
lem  above  (as  well  as  on  those  in  [3]  and  others),  at  first  glance,  one  might  be  unimpressed 
by  the  behavior  of  the  algorithm  on  the  academic  examples  of  the  last  section.  Of  course, 
they  were  designed  to  illustrate  the  tightness  of  our  convergence  results,  and  the  crucial  di¬ 
rectional  dependence  of  GPS  methods.  Our  interest  is  for  optimization  problems,  like  the 
planform  problem,  where  derivative  based  methods  are  impractical.  Our  algorithm  can  only 
rely  on  function  values,  and  sometimes  even  these  values  are  not  reliable.  A  design  example 
is  presented  in  [4,  5]  where  two  times  out  of  three  the  evaluation  of  the  objective  function 
failed  to  produce  a  value. 

A  consequence  of  this  absence  of  structure  is  that  the  convergence  results  that  are  guar¬ 
anteed  depend  in  part  on  the  set  of  directions  used  in  the  POLL  step.  Indeed,  the  richer  the 
set  of  directions,  the  stronger  the  convergence  result,  since  adding  directions  can  increase 
the  number  of  refining  and  limit  directions  and  widen  the  cone  C/  of  Proposition  6.7,  and 
hence  narrow  its  polar  cone  where  the  negative  gradient  of  the  objective  is  shown  to  reside. 
Intuitively,  if  a  POLL  step  identifies  a  poll  center  that  is  a  mesh  isolated  filter  point,  then,  the 
next  time  a  POLL  is  performed  there  (with  a  reduced  mesh  size  parameter)  it  would  be  natural 
to  use  a  different  positive  spanning  set  to  increase  the  likelihood  of  detecting  an  eventual  de¬ 
scent  direction.  However,  essential  to  the  convergence  proof,  is  a  finite  total  number  of  polling 
directions.  It  follows  that  one  cannot  attempt  to  obtain  a  dense  set  of  polling  directions. 

In  practice,  one  would  never  use  a  pattern  search  algorithm  following  the  rules  upon 
which  these  examples  are  based.  First,  we  would  use  a  SEARCH  step  such  as  a  space-filling 
latin  hypercube  sampling,  or  a  surrogate  based  exploration  or  a  more  local  SEARCH  such  as 
the  type  suggested  in  [12].  Second,  the  set  of  polling  directions  would  be  enlarged  in  order 
to  avoid  large  gaps  in  the  directions  explored.  Finally,  the  polling  centers  would  sometimes 
be  the  feasible  incumbent,  and  sometimes  the  infeasible  one  with  least  constraint  violation 
value,  but  when  promising  filter  points  are  generated  (such  as  one  with  low  /  and  h  values), 
nothing  stops  the  SEARCH  step  from  including  an  unofficial  POLL  around  these  candidates 
as  a  part  of  the  search.  These  simple  algorithmic  enhancements  fit  in  the  general  description 
of  the  algorithm  presented  in  Section  4.3.  Even  with  these  improvements  one  could  devise 
twisted  examples  with  the  behavior  of  the  above  examples.  It  is  however  unlikely  that  such 
behavior  would  be  encountered  in  practice. 
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